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This  report  is  two  sepsrate  but  parallel  efforts  on  system  software  debugging 
studies.  The  first  study  describes  a theoretical  approach  on  defining  new 
resources  management  algorithms  that  can  be  utilized  in  an  operating  system. 
The  second  report  describes  an  experimental  work  station  for  performing 
sophisticated  operating  system  software  debugging.  The  actual  experlmertatloi 
is  performed  by  connecting  a PDP  11/40  to  a PDP-10  through  a high  speed  data 
bus  and  installing  on  the  PDP  11/40  a virtual  machine  monitor. 
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EVALUATION 


The  work  described  in  this  report  and  performed  during  this  effort 
represents  a significant  accomplishment  in  the  structuring  of 
resource  management  algorithms  for  operating  systems.  First  a 
theoretical  base  was  described  that  is  a fresh  and  general  view 
of  the  resource  management  problems  in  complex  operating  systems. 
Second , an  experimental  facility  was  built  to  examine  various 
resource  management  policies. 

A critical  component  of  this  facility,  a virtual  machine  monitor, 
represents  the  resource  management  portion  of  an  operating  system. 
Th  s facility  allows  one  to  virtualize  a complete  computer  system 
at  a work  station  without  having  extensive  peripherals  at  the 
work  station. 

This  effort  demonstrates  the  types  of  capabilities  that  will 
compose  future  operating  systems. 
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Preface 


This  final  report  on  Contract  No.  F30602-77-C-0058  is  composed 
of  two  parts.  The  first  part  documents  research  on  novel 
resource  management  algorithms  used  in  computing  systems.  What 
we  are  referring  to  are  the  CPU,  and  memory  management 
algorithms  that  are  the  heart  of  the  resource  allocation  function 
of  operating  systems. 

The  contribution  made  by  this  research  is  of  a fundamental 
nature  and  we  expect  that  it  will  revolutionize  the  design 
of  operating  systems  in  the  future.  In  particular  a key  con- 
tribution made  by  this  research  is  an  unification  of  the 
page  replacement  policies  in  operating  system.  In  fact,  we 
present  here  a result  that  shows  that  many  of  the  commonly 
accepted  page  replacement  policies  can  be  considered  a 
special  case  of  the  more  general  policy  based  on  an  ARIMA 
(1,1,1)  model  proposed  by  this  work. 

The  formulation  of  a unifying  theory,  which  shows  th.'t  previous 
known  cases  are  special  cases  of  a more  general  case,  not  only 
results  in  conceptual  clarity  and  economy,  but  it  actually 
sheds  further  light  on  the  conditions  where  the  special  cases 
are  really  applicable. 


An  additional  fall  out  of  this  research,  and  a very  significant 
one,  is  that  it  becomes  possible  to  design  operating  system 
memory  management  policies  which  are  capable  of  adaptation. 

The  point  being  that  the  general  unified  theory,  which  is 
presented  in  this  report,  supplies  two  key  factors.  First, 
as  we  stated  above,  an  understanding  of  the  conditions  under 
which  specific  memory  management  policies  are  optimal  and 
secondly,  a general  algorithm  which  comprises  these  policies 
as  special  cases.  Therefore,  one  is  in  the  position  to 
design  an  operating  system  which  uses  a generalized  memory 
management  policy  capable  of  adapting  to  conditions.  In 
other  words,  the  ARIMA  (1,1,1)  page  replacement  algorithm 
could  be  embodied  in  such  a computing  system  and  the  algorithm 
would  automatically  specialize  to  its  extreme  special  cases 
when  the  operating  conditions  warrant. 

Therefore,  the  key  contribution  of  the  first  part  of  ths  report, 
is  a fresh,  much  more  powerful  and  general  view  to  the  resource 
management  problem  in  computing  systems.  This  contribution 
capitalizes  on  the  very  extensive  and  solid  body  of  mathematical 
theory  that  has  been  developed  by  the  control  theoreticians. 

The  second  part  of  the  report  presents  work  on  new  notions 
on  the  organization  of  a debugging  environment.  The  basic 
principle  here  is  to  give  the  person  doing  the  debugging  of 
complex  software  an  inexpensive  work  station  with  a number  of 
tools  which  enable  him  to  obtain  extensive  snapshots  of  the 
state  of  a complex  software  system. 
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The  reason  why  the  work  station  can  be  inexpensive  and  still 
offer  very  powerful  status  capturing  and  reporting  capabilities 
is  that  the  work  station  obtains  most  of  its  peripherals,  through 
a novel  virtualization  technique,  from  the  host  computer. 

Since  the  work  station  is  capable  of  transferring  at  a very 
high  data  rate  large  sections  of  the  main  computer  memory 
it  has  the  ability  to  capture  extensive  status  information 
and  reporting  it  quickly. 

These  two  parallel  efforts  were  undertaken  in  this  program, 
because  the  experimental  apparatus  needed  to  verify  the 
theories  presented  in  Part  I essentially  supplies  the  complete 
underlining  structure  for  performing  the  exploration  in  Part  II. 

An  inexpensive  experimental  facility  was  built  by 
connecting  a PDP-11/40  to  a PDP-10  through  a very  high  speed 
data  bus  and  by  installing  on  the  PDP-11/40  a virtual  machine 
monitor.  This  kind  of  facility  is  of  course  a very  inexpensive 
and  very  convenient  facility  for  exploring  alternative  memory 
management  policies.  In  fact,  the  virtual  machine  monitor 
is  essentially  the  resource  management  function  of  an  operating 
system  and  the  high  speed  data  bus  allows  one  to  virtualize 
a complete  computer  at  the  work  station  without  having  to  have 
any  expensive  peripherals  at  the  work  station. 
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Once  this  facility  for  experimenting  on  operating  system  resource 
management  algorithms  had  been  put  in  place,  it  became  very 
natural  to  use  it  for  exploring  approaches  to  debugging  of 
complex  systems  running  on  the  host  processor  and  this  is 
exactly  the  work  that  is  reported  in  Part  II  of  this  report. 
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This  thesis  proposes  the  application  of  control  theory  to  the 
dynamic  optimization  of  computer  systems  performance.  Until  now, 
queueing  theory  has  been  extensively  used  in  the  evaluation  and  modeling 
of  computer  systems.  It  is  a good  design  and  static  analysis  tool. 
However,  it  provides  little  run  time  guidance.  For  dynamic  (run  time) 
optimization  we  need  to  exploit  modern  control  theoretic  techniques  such 
as  state  space  models,  stochastic  filtering  and  estimation,  time  series 
analysis,  etc.  In  this  thesis,  a general  control  theoretic  approach  is 
proposed  for  the  formulation  of  operating  systems  resource  management 
policies.  The  approach  is  exemplified  by  formulating  policies  for  CPU 
and  memory  management . 

The  problem  of  CPU  management  is  that  of  deciding  which  task  from 
among  a set  of  ready  tasks  should  be  run  next.  The  main  problem 
encountered  in  the  practical  implementation  of  theoretically  optimal 
algorithms  is  that  the  service-time  requirements  of  tasks  are  unknown. 
The  proposed  solution  is  to  model  the  CPU  demand  as  a stochastic 

process,  and  to  predict  the  future  demands  of  a Job  from  its  past 

e 

behavior.  Several  analytical  results  concerning  the  effect  of 
prediction  errors  are  derived.  An  empirical  study  of  program  behavior 
is  made  to  find  a suitable  predictor.  Several  different  models  are 
compared.  Finally,  it  is  shown  that  a zeroth  order  autoregressive 
moving  average  model  is  the  most  appropriate  one.  Based  on  this 
observation  an  adaptive  scheduling  algorithm  called  "SPRPT"  (Shortest 
Predicted  Remaining  Processing  Time)  is  proposed. 
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The  problem  of  memory  management  la  also  formulated  as  the  problem 
of  predicting  future  page  references  from  past  program  behavior.  Using 
a zero-one  stochastic  process  model  for  page  references,  it  is  shown 
that  the  process  is  non-st at  ionary.  Empirical  analysis  is  presented  to 
show  that  the  page  reference  pattern  can  be  satisfactorily  modeled  by  an 
autoregressive  integrated  moving  average  model  of  order  1,1,1.  A two 
stage  exponential  predictor  is  derived  for  the  model.  Based  on  this 
predictor  a new  algorithm  called  "ARIMA  Page  Replacement  Algorithm"  is 
proposed.  This  algorithm  is  shown  to  be  easy  to  implement.  It  is  shown 
that  many  conventional  page  replacement  algorithms,  Including  Working 
Set,  are  merely  boundary  cases  of  the  ARIMA  algorithm.  The  conditions 
under  which  these  conventional  algorithms  are  optimal  are  described. 
The  limitations  of  the  formulation  and  possible  directions  for  future 
extensions  are  also  discussed. 


The  ARIMA  model  does  not  take  into  account  the  fact  that  a binary 
process  takes  only  two  values,  0 or  1 . This  discrepancy  is  removed  by 
developing  Boolean  models  for  such  processes.  It  is  shown  that  if  a 
binary  process  is  Markov  of  a finite  known  order,  it  can  be  modeled  as 
the  output  of  a Boolean  (switching)  system  driven  b>  a set  of  binary 
white  noises.  Modeling,  estimation,  and  prediction  of  the  process  using 
the  Boolean  model  is  described.  A method  is  developed  for  optimal 
non-linear  prediction  under  any  given  non-linear  cost  criterion.  All 
the  results  are  then  generalized  to  k-ary  processes,  i.e.,  processes 
which  take  integer  values  between  0 and  k-i.  finally,  the  application 
cf  the  model  tc  the  problem  of  memory  management  is  described. 
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Research  on  Resource  Management 
Algorithms  Used  in  Computina  Systems 
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Introduction 
Control-theoretic  view 

Conventionally  an  operating  system  is  defined  as  the  set  of 
computer  program  modules  which  control  the  allocation  and  use  of 
equipment  resources  such  as  the  central  processing  unit  (CPU),  main 
memory,  secondary  storage,  I/O  devices  and  files  [MaD741.  These 
programs  resolve  conflicts,  attempt  to  optimize  performance,  and 
interface  between  the  user  program  and  computer  resources  (hardware  and 
system  software). 
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For  a control  theorist  , an  operating  system  is  a set  of 
controllers  which  exercise  control  over  the  allocation  of  some  system 
resource.  The  goal  of  each  controller  is  tc  optimize  system  performance 
while  operating  within  the  constraints  of  resource  availability. 

Figure  1 . 1 shows  some  of  the  components  cf  an  operating  system. 
The  Controllers  are  represented  by  circles.  The  "load  controller" 
controls  the  number  of  jobs  allowed  to  Ice  in.  The  Job  controller  (Job 
scheduler,  or  high  level  dispatcher)  controls  the  transfer  of  Jobs  from 
the  "submitted"  queue  to  the  "ready"  queue.  This  decision  is  based  upon 
the  availability  of  resources  like  memory,  magtapes,  etc.  The  CPU 
controller  (task  dispatcher,  or  low  level  scheduler)  controls  the 
allocation  of  the  CPU.  It  selects  M task  from  the  set  of  ready  tasks 
and  allows  it  to  run.  The  paging  controller  (page  replacement 
algorithm,  or  memory  management  algorithm)  controls  the  transfer  of 
pages  from  virtual  memory  (disx  or  drum)  tc  primary  memory,  and  so  on. 
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Introduction 
Control-theoretic  view 

The  control  components  of  an  operating  system  are  not  much 
different  from  those  of  other  systems,  except  probably,  in  that  they  are 
non-mechanical.  Obviously,  there  Is  much  that  can  be  gained  from 
control  theory  in  the  design  and  modeling  of  these  components. 
Unfortunately,  very  little  control  theory  has  been  used  for  this  purpose 
so  far.  Compared  with  the  highly  developed  theory  of. control  systems, 
most  control  algorithms  used  in  operating  systems  today  are  "primitive". 

izi  <agvfls<?siHSQPsiic  mu  oi  m osmim  sxsxsn 

Most  models  of  computer  systems  used  today  are  queueing-theoretic. 
From  a queueing-theoretic  viewpoint,  each  controller  of  the  operating 
system  is  a server.  Thus,  an  operating  system  is  a queueing  network. 
One  very  popular  queueing  model,  called  "Central  Server  Model",  is  shown 
in  Figure  1.2.  In  this  figure,  oircle*  represent  servers  and  rectangles 
indicate  the  location  of  queues.  Such  queueing  models  have  been  used  to 
explain  many  phenomena  ocouring  in  computer  systems  [Buz71].  Typical 
questions  that  have  been  answered  using  this  approach  are  the 
following  : 

1.  What  is  the  average  throughput? 

2.  What  is  the  average  utilization  of  the  CPU,  I/O  devices  etc. 

3.  What  is  the  average  response  time? 

4.  What  is  the  bottleneck  in  the  system  (would  a higher  speed  disk  do 

better)? 

5.  What  is  the  optimal  degree  of  multiprogramming? 
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Introduction 
Queueing-theoretic  view 


A vast  amount  of  literature  has  been  published  to  answer  these  and 
similar  questions  under  a variety  of  assumptions*  restrictions  and 
generalizations.  For  chronological  surveys  and  bibliographies  see 
[McK69,  Mun75*  Kle76,  L1C77].  Some  of  the  issues  investigated  are  the 
following  : 

1.  Service  Discipline:  M/M/1,  G/M/1,  M/G/1,  FCFS,  or  priority 
service,  e.g.,  see  [Shu76]. 

2.  Types  of  Jobs:  one  or  many  classes  [ BCM75 ] . 

3.  Devices  included:  Terminals  only  [Sch67],  terminals  and  I/O 
devices  [Buz71]. 

A.  State  dependent  or  stationary  probabilities  [Che75] 

5.  Exact  or  approximate  solutions  [GaS73,  Oel75,  CHW75] 

6.  Part  by  part  (hierarchical)  solutions  or  whole  solution  [BCB75]. 

In  spite  of  the  wide  applications  of  queueing  theory,  there  are 
some  inherent  limitations  to  Its  usefulness. 

1*1  LIMIT A1IQMS  OF  QUEUEING  THEORY 

Queueing  theory  represents  only  average  statistics.  It  tries  to 
represent  a number  of  Jobs  by  the  average  characteristics  of  the  class. 
The  "individuality"  of  a Job  Is  ignored.  In  this  sense,  it  is  a static 
analysis.  It  cannot  satisfactorily  represent  time  varylr.f  phenomena  or 
dynamlos.  Therefore,  it  Is  good  only  as  a design  time  tool.  It  oannot 
be  used  at  operation  time,  for  which  we  need  adaptive  techniques  that 
can  adapt  to  the  Individual  characteristics  and  time-varying  behavior  of 
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jobs.  To  give  a concrete  example,  a queueing  model  is  ideal  for  telling 
whether  the  disk  is  the  bottleneck  in  the  system  or  whether  a faster  CPU 
will  increase  efficiency  (both  design  time  questions).  However,  once  we 
have  acquired  the  proper  disk  and  CPU,  it  does  not  tell  us  which  job 
from  a given  a set  of  Jobs  should  be  given  the  CPU  or  the  disk  next. 
This  is  a dynamic  decision  problem,  which  can  only  be  solved  by  the 
application  of  techniques  from  decision  and  control  theory. 


Queueing  theory  is  good  for  modeling  a computer  system  and,  to  a 
certain  extent,  its  subsystems.  However,  when  we  come  down  to  the  level 
of  a program,  it  cannot  model  its  behavior  (because  there  are  no  queues 
to  be  modeled).  Given  all  the  known  information  about  a program,  it 
cannot  tell  what  the  program  behavior  is  likely  to  be  in  the  near 
future.  This  is  a prediction  problem.  Again,  control  theory  must  be 
used  for  this  purpose. 

Queueing  theory  cannot  model  the  interaction  between  the  space  and 
time  demands  of  a program.  Since  the  theory  cannot  model  either  the 
spaoe  demand  behavior  of  a program  or  its  time  demand  behavior,  it 
certainly  Is  inadequate  for  modeling  the  interaction  between  the  two. 


Bad  memory  management  may  eauae  frequent  page  faults  and  may  degrade  the 
performance  of  an  otherwise  good  scheduling  policy.  Still,  the  memory 
and  the  CPU  allocation  polloies  of  most  operating  systems  to  date  are 
more  or  less  independent.  This  is  due  to  a lack  of  dear  understanding 
of  the  interaction  between  them.  With  the  application  of  control  theory 
we  hope  to  remedy  this  situation,  because,  given  control -theoretic 


I 


1-8 

Introduction 

Limitations  of  queueing  theory 


models  of  two  systems,  their  Joint  model  oan  be  obtained  by  modeling  the 
cross-correlation  between  the  two. 

1*1  *EEIIISHAL  SffECIAHQMS  JEB2H  control  THEORY 

There  are  many  ooncepts  like  stability,  controllability,  and 
parameter  sensitivity,  that  are  well  established  in  control  theory  but 
have  not  been  used  in  computer  systems  modeling.  We  hope  that  the 
control-theoretic  approach  will  eventually  lead  to  a better 
understanding  of  these  concepts  as  applied  to  computer  systems.  For 
example,  take  the  concept  of  stability.  Instability  in  computer  systems 
occurs  in  the  form  of  excessive  overhead  caused  by  frequent  switching  of 
CPU  between  Jobs,  or  by  frequent  oscillation  of  pages  between  main  and 
secondary  memory.  Instability  in  The  control-theoretic  approach  is 
especially  suitable  fmf*  stability  studies,  e.g.,  for  determining  the 
effect  of  sudden  demand  variations,  or  the  effect  of  measurement  delays. 
There  are  well  established  techniques  for  this  purpose. 

Controllability  studies  of  computer  systems  could  similarly  help 
us  to  determine  whether  it  is  possible  to  reach  the  optimum  performance 
state.  Parameter  sensitivity  is  already  a big  issue  even  in  current 
queueing  models.  One  of  the  major  studies  that  Investigated  the 
applicability  of  queueing  models  to  a real  interactive  system  was 
conducted  by  Moore  at  the  University  of  Michigan  [MooTIJ.  One 
conclusion  of  the  study  was  that  queueing  models  are  very  sensitive  to 
parameter  values  which  vary  considerably  with  time  and  load  variations. 
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Again,  control  theory  with  it  a well  eatabliahed  techniques  for 
sensitivity  analysis  provides  better  hope. 


1.5  SURVEY  0£  APPLICATIONS  QQHIfiQL  IH 


Wilkes  was  probably  the  first  to  atrongly  advocate  the 

4 

exploitation  of  control  theory  for  computer  systems  modeling.  In  his 
paper  [W1173],  he  stated: 


"We  are  not  yet  in  a position,  and  perhaps  never  will 
be,  to  write  down  equations  of  motion  for  computer  systems. 
However  this  does  not  exclude  the  design  of  a control 
system.  Indeed,  it  is  Just  in  circumstances  where  the 
dynamical  equation  are  not  fully  understood  or  when  the 
system  must  operate  in  an  environment  that  can  vary  over  a 
wide  range  that  control  engineering  comes  into  its  own.” 


The  paper  presents  many  arguments  for  applying  control  theory.  We  do 
not  intend  to  duplicate  those  arguments  here.  To  illustrate  his  ideas, 
Wilkes  proposed  a general  model  of  paging  systems. 

Adaptive  policies  for  many  components  of  operating  systems  have 
been  proposed.  Dynamic  tuning  of  allocation  policies  to  improve 
throughput  in  multiprogramming  systems  has  been  suggested  by  Wulf 
[Wul69].  An  adaptive  implementation  of  a load  controller  is  described 
in  [W1171].  Blevins  and  Ramamoorthy  have  investigated  the  feasibility 
of  a dynamically  adaptive  operating  system  [B1R76].  Two  different 
techniques  for  adaptive  control  of  the  degree  of  multiprogramming  have 
been  deseribed  in  [DKL76]. 
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The  need  for  a control -theoretic  approach  was  also  stressed  by 
Arnold  and  Gagllardl  [ArG74],  They  proposed  a state  space  formulation 
using  resource  utilization  as  the  .state  variables.  A dynamic 
programming  approach  to  memory  management  and  scheduling  problems  is 
described  in  [Lew74,  Lew76].  A survey  of  some  early  applications  of 
statistical  techniques  to  computer  systems  analysis  can  be  found  in 
[Ash72]. 

\ 

The  work  most  closely  related  to  this  thesis  is  that  of  Arnold 
[Arn75,  Arn78].  Using  correlation  properties  of  the  memory  demand 

behavior  of  programs,  he  has  investigated  the  applicability  of  the 
Wiener  filter  theory  to  the  design  of  a memory  management  policy. 

Ul  mMciPAi  camattuiiQMS  m qbgamkahqm  ql  m.  thesis 

In  this  thesis  we  propose  the  following  general  control-theoretio 
approach  to  the  formulation  of  resource  management  policies  for 
operating  systems. 

1.  In  order  to  develop  a resource  management  policy,  model  the 
corresponding  program  behavior  as  a stochastic  process. 

2.  Using  identification  techniques  and  empirical  data,  identify  a 
suitable  model  structure  for  the  process*  and  estimate  typical 
values  of  model  parameters. 


. i 


f 
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* The  term  process  is  used  here  exclusively  in  the  control-theoretic 
sense  of  stoohastlo  process.  To  avoid  confusion,  the  term  task  is  used 
to  denote  computer  processes  e.g.,  we  say  "ready  tasks"  instead  of 
"ready  processes". 
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3.  Based  on  the  model,  formulate  a prediction  strategy  for  the 
stochastic  process,  and  hence  a resource  management  policy. 

The  policy  so  obtained  is  dynamic  in  the  sense  that  it  varies  the 
allocation  of  the  system  resource  to  a user  Job  depending  upon  the 
recent  past  behavior  of  the  job.  It , thus,  provides  the  run  time 
optimization  not  possible  with  the  queueing  theory  approach.  Also, 
notice  that  the  individuality  of  the  Job  is  fully  exploited.  The  key 
step  in  the  approach  is  the  formulation  of  the  stochastic  process  model 
in  such  a way  that  the  allocation  problem  reduces  to  a prediction 
problem.  We  exemplify  this  approach  by  formulating  control -theoretic 
policies  for  CPU  scheduling  and  page  replacement . Polioles  for 
allocation  of  other  shared  resources  (e.g.,  disks)  can  be,  similarly, 
formulated. 

Pormulatlon  of  the  CPU  scheduling  policy  is  described  in- 
Chapter  II.  The  time  taken  by  successive  compute  bursts  of  a program  is 
modeled  as  a stochastic  process.  It  is  shown  that  the  main  problem  is 
that  of  predicting  the  future  demands  of  a job  from  its  past  behavior. 
A few  analytical  results  are  derived  concerning  the  Increase  in  the  mean 
weighted  flow  time  due  to  prediction  error.  Correlation  techniques 
(also  called  time  series  analysis  techniques)  are  used  to  Identify  a 
suitable  model  struoture  for  the  stoohastic  process,  empirical  data  on 
the  CPU  demand  behavior  of  users  of  an  aotual  time  sharing  system  is 
used  for  this  purpose.  Details  of  the  procedure  used  for  modeling  and 
parameter  estimation  from  the  data  are  included.  In  particular,  it  is 
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shown  that  the  CPU  demand  process  is  a stationary  stochastic  process 
having  very  little  autocorrelation.  The  efficiency  of  several 
autoregressive  moving  average  (ARMA)  models  is  compared.  The  final 
conclusion  is  that  the  gains  are  very  small  and  that  a zeroth  order 
non-zero  mean  white  noise  ( ARMA(0,0)  ) model  is  appropriate  for  the 
process.  Based  on  this  conclusion,  several  different  predidon  schemes 
are  proposed.  An  adaptive  scheduling  algorithm  called  "Shortest 
Predicted  Remaining  Processing  Time"  (SPRPT)  is  proposed. 

In  Chapter  III,  the  problem  of  page  replacement  is  formulated  as  a 
prediction  problem.  Using  a stochastic  process  model  of  memory  demand 
behavior,  suggested  by  Arnold  [Arn75],  an  expression  is  derived  for  the 
cost  of  prediction  error.  The  identification  analysis  shows  that  the 
process  is  non-st at ionary.  The  non-stationarlty  is,  however, 
homogeneous  in  the  sense  that  the  first  differences  of  the  process  are 
stationary.  An  autoregressive  integrated  moving  average  model  of  order 
1,1,1  ( ARIMA( 1,1,1)  ) is  shown  to  be  an  appropriate  model  for  the 

process.  A two  step  exponential  predictor  is  derived  for  the  model. 
Based  on  this  predictor,  a new  page  replacement  algorithm  called  the 
"ARIMA"  algorithm  is  proposed.  Even  though  the  origin  of  the  algorithm 
lies  in  complex  control-theoretio  ideas,  its  final  implementation  is 
very  simple.  Moreover,  it  turns  out  that  many  conventional  page 
replacement  algorithms  like  the  working  set  algorithm  [Den68],  Arnold's 
Wiener  filter  algorithm  [Am75),  and  the  independent  reference  model 
[ADU71]  are  speolal  oases  of  the  ARIMA  algorithm.  The  control-theoretio 
derivation  of  the  conditions  under  whloh  these  algorithms  are  optimal  is 
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praaantad. 

Chapter  IV  la  devoted  to  developing  new  technique*  for  analysia  of 
binary  process**  like  the  memory  demand  process.  The  ARIMA  model  does 
not  take  into  account  the  fact  that  a binary  process  takes  only  two 
values,  0 or  1.  In  this  chapter,  an  attempt  is  made  to  remove  this 
discrepancy.  It  is  shown  that  if  a binary  process  is  Markov  of  a finite 
known  order,  it  can  be  modeled  as  the  output  of  a Boolean  (switching) 
system  driven  by  a set  of  binary  white  noises.  Modeling,  estimation, 
and  prediction  of  the  process  using  the  Boolean  model  is  described.  A 
method  is  developed  for  optimal  non-linear  prediction  under  any  given 
linear  or  non-linear  cost  criterion.  All  the  results  are  then 
generalized  to  k-ary  processes,  l.e.,  processes  which  take  integer 
values  between  0 and  k-1.  The  model  is  shown  to  be  applicable  to  a 
class  of  non-st at lonary  processes  also.  Finally,  the  application  of  the 
model  to  the  problem  of  memory  management  is  described. 

In  this  thesis  we  make  extensive  use  of  control-theoretic  terms 
and  oonoepts.  However,  sino*  a majority  of  the  readers  of  the  thesis 
are  likely  to  be  oomputer  scientists,  a tutorial  approach  is  followed  in 
deriving  the  oontrol -theoretic  results.  Whenever  possible,  simple  and 
intuitive  explanations  of  the  inferences  based  on  control  theory  are 
provided.  A brief  explanation  of  ARIMA  models,  whioh  are  used 
extensively  in  this  thesis,  is  given  in  Appendix  A.  Further  details  of 
oontrol -theoretic  oonoepts  oan  be  obtained  from  [N*173,  BoJ70,  Aat70, 
BrH69] . 
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The  problem  of  CPU  management  la  that  of  deciding  which  taak  from 
among  a set  of  ready  tasks  be  given  the  CPU  next.  In  the  literature 
this  problem  is  also  referred  to  as  low  level  scheduling,  short  term 
scheduling,  or  process  dispatching.  There  has  been  a considerable 
amount  of  work  on  designing  scheduling  strategies  for  optimizing 
different  cost  oriteria,  single  or  multiprocessor  strategies, and  for 
different  preoedenoe  constraints  among  the  Jobs  [Cof76J.  A common 
underlying  assumption  in  all  these  researches  is  that  the  CPU  time 
required  by  each  Job  is  known.  For  example,  the  simplest  scheduling 
problem  is  that  of  scheduling  n Independent  tasks  with  known  CPU  time 
requirements  of  t1t  tj,...,^  respectively  on  a single  processor  in  such 
a way  as  to  minimize  average  finish  time  for  all  users.  If  the  Jobs 
were  scheduled  in  lexicographic  order  (l.e.,  1,2,...n),  the  average 
finish  time  would  be 

R ■;  £< n-i+Dti 
ni«1 

A very  well  known  solution  to  this  problem  is  due  to 
Smith  [Sml56].  This  solution  is  oallsd  "SPT”  or  Shortest  Processing 
Tims  rule  i.e.,  the  Jobs  are  given  the  CPU  in  the  order  of 
non-decreasing  CPU  demand.  For  those  not  familiar  with  this  fact  the 
following  example  should  prove  convincing. 

! Consider  scheduling  two  Jobs  J,  Md  J2  with  each  requiring 
only  one  oyole  of  computation  followed  by  output.  The  time  required  for 
CPU  and  I/O  are  shown  in  Figure  2.1  . The  scheduling  decision  is  to 
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decide  which  of  the  Jobs  geta  the  CPU  first.  Obviously  there  are  only 
two  options:  J1  first , or  J2  first.  The  calculation  of  the  average 

response  times  to  the  users  in  the  two  cases  are  also  shown  in  the 
figure.  It  is  dear  that  scheduling  the  shorter  Job  first  gives  a lower 
average  response  time. 

In  the  case  of  Line  printer  scheduling,  the  service  time 
requirements  can  be  predicted  reasonably  accurately  from  the  size  of  the 
file  to  be  printed  or  by  counting  the  number  of  linefeeds  and  formfeeds 
if  necessary.  However,  in  the  case  of  the  CPU,  there  is  no  known  method 
of  predicting  the  future  CPU  time  requirements  of  the  Job.  This  makes 
SPT  and  all  similar  scheduling  strategies  unlmplementable. 

In  the  absence  of  knowledge  of  program  behavior,  the  operating 
system  designer  is  left  to  use  his  own  ad  hoc  prediction  strategy.  One 

such  strategy  is  to  assume  that  all  the  tasks  are  going  to  take  the  same 

(a  fixed  quantum  of)  time.  The  tasks  are,  therefore,  given  the  CPU  in  a 

round  robin  fashion  for  the  fixed  quantum  of  time,  and  if  a task  has  not 

completed  by  the  end  of  the  quantum,  it  is  put  back  on  the  run  queue. 

It  is  obvious  that  full-informat Ion  strategies  like  SPT  perform  better 
than  no-informat ion  strategies  like  the  fixed-quantum  round  robin.  This 
point  is  illustrated  in  Figure  2.2  where  it  is  shown  that  if  Job 

happens  to  be  the  first  in  the  queue  the  response  time  is  25;  otherwise, 

it  is  24.5.  In  both  oases  it  is  more  than  the  SPT  response  tine. 
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Figure  2.2:  Round  robin  scheduling  with  unit  quantum  time 
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Up  till 

now  we  have  assumed 

that  all 

t he  t asks 

arrive 

simultaneously 

and  are 

ready 

for 

processing 

at  the  same 

time. 

Obviously,  this 

is  not  ths 

ease  in 

a rerl  computer  system,  where, 

tasks 

arrive  intermittently.  The  optimal  scheduling  atrategy  ia  still 
basically  the  same.  At  each  point  in  time  one  makes  the  beat  selection 
from  among  those  jobs  available,  considering  only  the  remaining 
processing  time  of  the  job  that  is  currently  being  executed.  This 
generalization  of  SPf  is  called  the  Shortest  Rtxainlng  Processing  Time 
(SRPT)  rule  [S«i78j.  Tnis  minimize*  the  mean  flew  time  if  there  is  no 
extra  cost  irvolvsd  in  rcs'Jain*  a preempted  job.  Other  result s for  the 
case  of  simultaneous  arrival  are  similarly  applieabla.  Note,  in 
particular,  that  it  ia  not  necessary  to  havs  any  advance  Information 
about  job  arrival a. 
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Consider  a program  In  a uniprogramming  situation.  Figure  2.3 
shows  the  typical  time  behavior  of  the  program*.  The  program  oscillates 
between  CPU  and  I/O  devices  (Disk,  Teletype,  Card  reader,  Magnetic  tape 
eto.).  Most  program  have  three  phases.  During  the  first  phase  they  do 
very  little  computation,  spending  most  of  the  time  collecting  parameter 
values  from  the  user.  The  program  then  enters  a computation  phase 
consisting  generally  of  one  or  more  loops.  Finally,  the  program  outputs 
the  results.  The  computation  phase  constitutes  a major  portion  of  the 
life  of  the  program.  The  cyclic  nature  of  this  phase  (due  to  loops) 
makes  the  program  behavior  somewhat  predictable.  While  in  a loop,  the 
program  repeatedly  references  the  same  set  of  pages,  and  makes  similar 
CPU  and  I/O  demands.  Under  the  name  of  "Principle  of  Locality",  this 
behavior  has.  been  successfully  exploited  for  memory  management.  The 
working  ' set  strategy  of  memory  management  is  partly  based  on  this 
prlnelple.  This  strategy  states  that  the  set  of  pages  referenced  during 
the  last  time  interval  T are  more  likely  to  be  referenoed  in  the  near 
future  than  other  pages. 

The  CPU  management  equivalent  of  the  WS  strategy  is  to  say  that 
the  length  of  the  last  CPU  burst  is  the  likely  length  of  the  next  CPU 
burst.  This  strategy  has  been  used  in  many  operating  systems,  though 
there  are  many  different  forms  of  its  implementations.  One 

* The  same  is  applicable  to  a program  in  a multiprogramming  si  uatlon 
provided  the  time  scale  represents  "virtual  time". 
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Figure  2.3 : CPU  and  I/O  demands  of  a typical  program 
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implementation  method  ia  to  put  a job  taking  a lot  of  CPU  time  on  a low 
priority  queue  eo  that  the  next  time  it  will  get  the  CPU  only  afer  those 
Jobs  which  have  taken  less  CPU  time  this  cycle.  Unfortunately,  this 
principle,  although  oomnonly  used,  has  never  been  theoretically 
explained . 

One  aim  of  the  researoh  reported  here  was  to  check  the  validity  of 
this  "Next  Equal  to  Last"  (NEL)  principle,  and,  if  it  was  found  invalid, 
to  find  a strategy  for  the  best  prediction  of  the  future  CPU  demand  of  a 
program  from  its  past  behavior.  We  model  the  CPU  demands  of  a job  as  a 
stoohastic  process.  The  kth  CPU  burst  is  modeled  as  a random  variable 
z(k).  One  way  of  representing  a stochastic  process  is  to  model  it  as 
the  output  of  a control  system  driven  by  white  noise  (see  Figure  2.4). 
Thus,  as  seen  by  the  CPU  scheduler,  the  program  is  like  a control  system 
which  generates  successive  CPU  demands.  A general  time  series  model  for 
such  a process  is  given  by  the  following  equation: 

*(t)  « f(z( 1 ) ,z(2) z(t-1 ) ,*(1) ,e(2) ,. .. ,e(t )) 

Where  z(t)  represents  tth  CPU  burst  and  e(t)  is  the  t*h  random  shook.  A 
linearized  and  time  invariant  form  of  the  above  equation  i*  the  well 
known  ARMA(p,q)  model  (see  Appendix  A for  details  on  ARMA  models)  : 
z(t)  * w+ajz(t-1 )♦.... ♦api(t-p)4>e(t)-b1e(t-1)-...-bqe(t-q) 

We  ohoose  this  formulation  to  model  the  CPU  demand  behavior  of 
programs,  beoause  there  are  well  established  techniques  to  find  suoh 
models  from  empirical  data.  Onoe  a suitable  ARMA  model  is  found,  it  is 
easy  to  oonvert  it  to  other  models  (e.g.,  state  space  model),  if 


neoessary 
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Figure  2.4-.  CPU  demands  modeled  as  a stochostic  process 
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In  order  to  study  the  effect  of  prediction  errors,  we  need  to 
choose  a performance  measure.  Consider  the  problem  of  scheduling  n 
independent  tasks  with  CPU  time  requirements  of  t1ft2,...,tn 
respectively  on  a single  processor.  A schedule  consists  of  specifying 
the  sequence  in  which  the  tasks  should  be  given  to  the  processor.  There 
are  many  different  performance  measures  for  comparing  different 
schedules.  The  measure  most  commonly  used  for  single  processor 
scheduling  is  "Mean  Weighted  Finish  Time"  (MWFT).  It  is  defined  as 
follows : 

Where  f^  the  finishing  time  of  ith  task  and  Wj  ia  the  weight  or 
deferral  cost  of  the  task.  It  was  shown  by  Smith  [Smi56]  that  this  cost 
criterion  is  minimized  by  arranging  the  tasks  in  the  order  of 
non-decreasing  ratio  tj/Wi#  if  all  the  tasks  have  equal  deferral  costs, 
i.e.,  vi  w^  then  the  cost  c is  called  average  finishing  time  or 
average  response  time.  It  follows  from  the  above  that  the  average 
response  time  is  minimized  by  sequencing  the  tasks  in  the  order  of 

non-decreasing  t^.  This  rule  is  commonly  known  as  "Shortest  Processing 
Time"  (SPT) . rule. 


* S 
I 


It  has  been  shown  that  SPT  also  minimizes  the  following  cost 
oriteria  [ CMM67 ] : 


1.  Mean  power  of  finishing  time  - JT 


1 * 
n 


i«i 


2-12 

CPU  Management 

Effect  of  Prediction  Errora 

2.  Mean  waiting  time  1 ^ (fi-t*) 

1*1 

3.  Mean  power  of  waiting  time. 

4.  Mean  lateness  (time  beyond  deadline). 

5.  Mean  tardiness  if  all  jobs  are  tardy. 

6.  Mean  number  of  tasks  waiting 

However,  SPT  does  not  optimize  the  following  cost  criteria  (all  of  which 
are  functions  of  due  dates): 

1.  Maximum  lateness 

2.  Maximum  tardiness 

3.  Mean  tardiness 

4.  Number  of  tardy  jobs. 


Fortunately,  due  dates  are  rarely,  if  ever,  specified  for  CPU  scheduling 
and  hence  the  above  criteria  are  of  no  practical  interest.  For  a 
computer  user  the  most  important  criterion  is  a low  response  time*. 
Since  a Job  oonsista  of  several  CPU-1/0  cycles  (or  CPU-I/O  tasks),  the 
response  time  is  the  sum  of  the  finishing  time  of  these  tasks  of  the 
Job.  An  increase  in  the  finishing  time  of  a task  directly  contributes 
to  an  increase  in  the  response  time. 


* Some  researchers  believe  that  it  is  the  consistency  of  response  time 
rather  than  minimality  that  is  of  oonoern  to  a user  [HoP72].  For 
example,  if  a program  takes  1 minute  on  one  day,  it  is  quite  bothersome 
to  the  user  if  it  takes  5 minutes  on  another  day.  However,  the  proper 
control  point  for  this  criterion  is  load  control  (control  of  the  number 
of  users  allowed  to  log  in  or  the  number  of  batob  Jobs  allowed  to  run 
simultaneously).  Therefore,  we  do  not  oonslder  this  criterion. 
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In  the  following,  we  derive  a few  analytical  results  concerning 
the  Increase  in  mean  weighted  finishing  time  (MWFT)  of  tasks  due  to 
prediction  errors.  We  first  oonslder  a. very  general  ease  where  the  time 
requirements  of  all  Jobs  are  to  be  predicted.  Then  we  oonslder  another 
case,  where  only  one  Job  is  considered  for  prediction,  the  oompute  time 
requirements  of  other  Jobs  is  assumed  to  be  known.  The  results  are 
presented  as  Theorems  2.3.1  and  2.3.2  below.  The  proofs  of  theorems  are 
given  in  Appendix  B. 


2»3.1  Theorem  [Non-detemlnlatlo  Case]  : Consider  a set  of  n tasks  TQ| 

T1»  ....  Tn„i  with  compute  time  requirements  of  tQ,  t 1 , ...,  tn-1 

respectively,  where  all  the  times  are  unknown  and  are  predicted  as 


, ...,  Cn_1  etc.  The  predictor  is  such  that  the  predicted 

a random  variable  with  distribution  F^f^.  the  lnorease  in 

finishing  time  (MFT)  due  to  prediction  error  is  given  by 

„ 1 ^s.1 

Cm-  J I 


n 


1>0 


(1-1)  ti 


where  I . Predicted  position  of  T. 


fi<t)dt 

1.0  J 

Jrfi 


time  is 
the  mean 


2.3.2  THEOREM  [Deterministic  Case]  : Given  a set  of  n tasks 

T0’Tl»*>*’Tn-1  with  compute  time  requirements  of  to«ti,...,tn..j 
respectively,  where  i1,...,tn.1  are  known  exaotly  and  t0  is  predicted  as 
tp,  then  the  lnorease  in  mean  weighted  finishing  time  (MWFT)  due  to 
prediction  error  is  given  by: 
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° * 3 El<*o‘k  -«k‘o>! 

kel 

Where  I.{k  : ~ ,J} 
k 

j a /( V«o.  Vwo>  if  t0<tp 
\<  Vwo»  to/wo>  lf  tp>t0 

Informally,  I is  the  set  of  indioes  of  tasks  lying  between  the  predicted 
and  the  real  position  of  TQ#  j la  the  interval  between  t0/w(j  and  tp/wo. 

2. 3.2.1  corollary  : The  increase  in  mean  finishing  time  (wk,i  ¥k)  due 

t0  *0  predicted  as  tp  is  given  by  s 

0 • ; E itk-toi 
kel 

where  I « {k  : tQ<tk<tp  or  tp<tk<t0) 

One  implication  of  this  corollary  is  that  only  those  tasks  that 
lie  in  between  the  predicted  and  actual  position  of  the  task  contribute 
to  the  increase  in  MPT.  Thus  lf  the  compute  time  of  various  tasks  are 
arranged  in  Increasing  order  and  plotted  as  shown  in  the  Figure  2.5, 
then  the  increase  in  MFT  is  represented  by  the  hatched  area.  In  the 
special  case,  when  these  oonpute  times  are  linearly  increasing,  the 
inorease  in  MFT  is  proportional  to  square  of  the  prediction  error.  This 
faot  is  stated  by  the  following  oorollary  whose  proof  is  given  in 
Appendix  B. 

2. 3.2.2  oorollary  : If  tk,kTf  k« 1 ,2, . . . ,n-1  then  the  inorease  in  MFT 
due  to  tQ  predicted  as  tp  is  given  approximately  by: 

c * — (t«-t 

2nT'l0*V 
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Figure  2.5=  Increase  in  MFT  due  to  prediction  error 


I 
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It  Is  seen  from  above  theorems  that  the  error  In  prediction  of 
computer  time  of  a Job  affects  the  relative  placement  of  all  other  Jobs 
In  a very  complicated  fashion.  For  example,  it  is  possible  that  the 

time  . , t n_i  are  far  away  from  one  another  so  that  the  predicted 

value  even  though  away  from  t^  nay  not  result  In  any  change  in  the 

order  and  hence  the  net  effeot  on  MFT  may  be.  zero.  On  the  other  hand, 

it  Is  also  possible  that  the  times  .t0f...|tn.1  are  very  near  to  each 

other  so  that  a slight  prediction  error  may  result  in  a substantial 

change  in  the  schedule  and  hence  in  the  MFT.  Therefore,  except  in  some 

very  special  cases,  e.g.,  in  oorollary  2. 3. 2. 2,  it  is  not  possible  to 

express  the  cost  of  misprediction  as  a function  of  predlclon  error 

alone.  That  is,  there  is  no  one  simple  "f”  such  that  e«f(tn.t  ) 

u • p# 

represents  the  loss  funotlon.  We,  therefore,  choose  to  use  the 

conventional  least  square  criterion  to  predict  the  oompute  time.  In 

dther  words,  we  seek  to  predict  in  such  a way  that  the  average  value  of 

square  difference  between  the  predicted  and  the  actual  value  is  minimum. 
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LA  MIA  COLLECTION 

This  section  describes  the  experiment  to  collect  data  on  CPU 
demands  of  actual  programs.  The  experiment  was  conducted  on  a real  user 
environment  in  our  Aiken  Computation  Laboratory.  The  laboratory  has  a 
DECsystem-10  computer  with  TOPS- 10  operating  system.  The  system  is 
mainly  a research  faollity  for  use  by  graduate  students. 

The  TOPS- 10  operating  system  maintains  a number  of  queues  among 
which  the  jobs  are  distributed.  Por  example,  there  is  a queue  for  Jobs 
waiting  to  be  run,  a queue  for  jobs  waiting  for  disc  I/O,  a queue  for 
jobs  waiting  for  TTY  I/O  etc.  Thus . the  easiest  way  to  get  the  data  we 
require  is  to  watch  the  queue  history  of  the  program  i.e.,  to  note  the 
queue  the  Job  is  in  and  to  repeat  the  observation  at  every  clock  tick*. 

Table  2.1  gives  major  detalla  of  the  experiment.  It  consisted  of 
19  different  runs  spread  over  a month.  Each  run  consisted  of  randomly 
selecting  a user  and  watohing  his  queue  history  for  a period  of  about  M5 
minutes.  Along  with  the  queue  history  which  was  observed  every  clock 
tick,  many  other  parameters  like  program  name,  memory  used,  eto.  were 
also  reoorded  every  second. 

The  data  was  later  translated  to  produce  the  CPU  demand  processes 
of  individual  programs.  This  produoed  550  CPU  demand  processes 
consisting  of  the  length  of  successive  CPU  bursts  (total  CPU  usage 

* In  all  subsequent  discussions,  the  unit  of  time  will  be  a clock  tick 
oalled  "jiffy"  in  DEC  terminology.  A jiffy  is  the  oyele  period  of  the 
line  power  supply  i.e.,  1/60th  of  a seoond. 
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TABLE  2.1  : DATA  COLLECTION  EXPERIMENT 


Duration  of  the  experiment  1 month 

Number  of  rune  19 

Duration  of  each  run  45  minutes 

Number  of  programs  observed  550 

lumber  of  programs  with  80  or  more  bursts  33 

Number  of  program  histories  analyzed  33 

Number  of  user  histories  obtained  19 

Number  of  user  histories  analyzed  8 


between  successive  I/O  requests).  However,  most  of  these  program 
processes  were  too  short  l.e.,  consisted  of  only  a small  number  of 
observations  (number  of  CPU  bursts).  Only  33  processes  had  80  or  more 
observations.  These  were  obosen  for  correlation  analysis. 

We  also  obtained  19  user  processes  - one  for  each  run.  These 
consist  of  lengths  of  successive  CPU  demands  of  the  user  regardless  of 
the  program  being  run.  Of  these  user  processes,  alternate  (actually 
only  8)  processes  were  seleoted  for  analysis.  The  list  of  the  processes 
selected  for  analysis  is  shown  in  Table  2.2  . The  processes  are  named 
"XXXXX.YNN"  - where  XXXXX  is  either  "USER"  for  user  processes  or  the 
program  name.  Y is  the  user  identification  (letters  A,  B,  C,...)  and  NN 
is  the  aerial  number  of  the  program  in  a particular  run.  Thus  MAIN.R55 
stands  for  the  55th  program  run  by  user  R.  "MAIN"  is  the  name  of  the 


CPU  Management 
Data  Collection 


Page  2-19 


program.  Table  2.2  also  gives  the  type  of  the  program,  number  of 
observations  in  the  process,  its  mean  value,  standard  deviation  »lt  >n<j 
P-VALUE.  The  term  P-VALUE  will  be  explained  later  under  the  Chi-square 
test . 

The  developmental  nature  of  the  environment  is  obvious  from  the 
table.  Notice  that  14  (42 *)  of  the  programs  are  editing  (SOS  and  TECO), 
7 (21*)  are  FORTRAN  programs,  and  4 (12*)  are  ELI  programs.  FORTRAN  and 
ELI  are  the  main  languages  used  at  our  laboratory.  Most  users  follow  a 
cycle  of  editing  (TECO),  compiling  (FORTR),  and  running  the  program,  and 
then  reediting  etc.  This  is  typical  of  research  and  development 
environments.  In  a production  environment  in  an  Industry,  less  amount 
of  editing  and  more  application  program  execution  is  expected.  However, 
as  we  shall  see  later,  the  CPU  demand  behavior  of  editing  programs  and 
application  programs  are  not  very  different  except  that  the  mean  value 
of  CPU  burst  in  an  editing  program  tends  to  be  much  lower  than  that  in 
an  application  program.  Therefore,  it  is  plausible  that  the  results 
obtained  here  also  hold  in  a production  environment. 


T 
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Table 

2.2  : List 

of  CPU 

Demand  Processes  Analyzed 

S.  No 

. Process 

N 

z 

P-VALUE 

Program 

Name 

Chi-sq 

Type 

1. 

C0MP2.G63 

158 

24.8 

85.0 

0.000 

FORTRAN  Program 

2. 

ECL.B1 

81 

11.2 

18.9 

0.241 

ELI  Program 

J 

3. 

ECL.B2 

260 

74.7 

379.9 

0.006 

ELI  Program 

4. 

ECL.S1 

448 

49.0 

308.9 

0.997 

ELI  Program 

5. 

ECL.S2 

270 

77.6 

528.5 

0.999 

ELI  Program 

6. 

F0RTR.P21 

113 

5.4 

7.8 

0.178 

FORTRAN  compiler 

7. 

FORTR.P30 

234 

6.3 

7.6 

0.412 

FORTRAN  compiler 

\. 

8. 

F0RTR.P8 

349 

6.2 

6.7 

0.000 

FORTRAN  compiler 

9. 

F0RTR.Q17 

253 

5.2 

6.1 

0.001 

FORTRAN  compiler 

10. 

FRCD0.C1 

141 

606.7 

1298.0 

0.047 

FORTRAN  Program 

11. 

FRCDO.C11 

158 

184.2 

578.8 

0.000 

FORTRAN  Program 

12. 

M786S.U1 

504 

1.5 

5.7 

0.000 

FORTRAN  Program 

13. 

MAIN .Q10 

204 

8.4 

7.4 

0.000 

FORTRAN  Program 

14. 

MAIN.Q19 

98 

8.2 

5.9 

0.000 

FORTRAN  Program 

15. 

MAIM.R55 

129 

3.4 

3.4 

0.230 

FORTRAN  Program 

16. 

P.A19 

222 

9.2 

23.4 

0.000 

FORTRAN  Program 

17. 

PIP.G18 

140 

1.1 

0.7 

0.088 

Peripheral  I/O 

18. 

PIP.G45 

84 

1.0 

0.8 

0.000 

Peripheral  I/O 

19. 

PIP.G60 

225 

0.9 

0.3 

0.000 

Peripheral  I/O 

20. 

S0S.A21 

422 

1.9 

2.7 

0.000 

Text  Editor 

21. 

S0S.A22 

85 

2.0 

2.7 

0.646 

Text  Editor 

22. 

SOS.A23 

103 

1.5 

1.3 

0.374 

Text  Editor 

23. 

SOS. kb 

110 

2.8 

3.1 

0.606 

Text  Editor 

24. 

TEC0.B8 

90 

3.7 

6.6 

0.916 

Text  Editor 

25. 

TEC0.P1 

92 

2.7 

4.6 

0.540 

Text  Editor 

26. 

TEC0.F20 

199 

5.7 

6.3 

0.018 

Text  Editor 

i. 

27. 

TECO.G37 

116 

28.0 

122.2 

0.272 

Text  Editor 

28. 

TECO.G38 

221 

17.2 

64.8 

0.140 

Text  Editor 

I 

f 

— 

29. 

TEC0.055 

114 

4.3 

4.5 

0.000 

Text  Editor 

1 

30. 

TEC0.H1 

168 

2.2 

2.8 

0.001 

Text  Editor 

4 

k 

31. 

TEC0.J5 

90 

6.4 

6.4 

0.174 

Text  Editor 

32. 

T ECO. PI 

138 

4.6 

12.8 

0.979 

Text  Editor 

J 

33. 

TEC0.P13 

84 

4.3 

5.5 

0.568 

Text  Editor 

j 

34. 

USER.B 

587 

37.0 

257.7 

0.000 

35. 

USER . D 

259 

52.0 

329.2 

1.000 

f 

36. 

USER.? 

680 

5.5 

17.5 

1.000 

1 

37. 

USER . H 

413 

6.5 

39.9 

1.000 

38. 

USER . L 

372 

4.2 

7.8 

0.000 

39. 

USER.N 

471 

30.2 

187.5 

0.999 

40. 

USER.? 

1629 

7.1 

17.5 

0.000 

41. 

USER.T 

262 

25.1 

112.0 

0.554 

. . J 

T 


T 
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2*5  GAIA  AttALLSIS 

The  aim  of  data  analysis  Is  to  find  one  suitable  model  structure 

for  CPU  demand  behavior  of  programs.  The  two  main  steps  of  data 

analysis  are  model  identification  and  parameter  estimation.  The  first 

J 

step  consists  of  studying  the  first  and  the  second  order  statistics  of 

the  data  In  order  to  Identify  a class  of  models  suitable  for  the 

! 

process.  In  the  second  step,  these  models  are  fitted  to  each  process  to 

• • 

I 

find  the  maximum  aohievable  gain  Finally,  these  different  models  are 

i 

compared  to  give  one  general  model  for  all  CPU  demand  processes.  A 

large  part  of  the  data  analysis  reported  here  was  done  on  a time  series 

analysis  package  TS  developed  by  Professor  Vandalae  of  Harvard  Business 

School . 

Statistical  techniques  are  very  often  misused  and  results 

\ j 

misinterpreted.  It  is  easy  to  dr}w  misleading  conclusions  unless  the 

statistical  procedures  are  fully  understood  and  uaed  properly.  for 

example,  we  have  noticed  that  in  most  of  the  computer  science 

| 

\ 

literature,  correlation  techniques  are  used  without  significance  tests, 

4 

parameters  estimated  without  their  oonfidenoe  intervals,  and  so  on.  Vie, 

1 

I 

1 

therefore,  decided  to  explain  the  methodology  along  with  the  results. 

In  the  following  we  have  tried  to  desorlbe  the  reasoning  behind  each 

1 

inference  that  we  draw.  The  description  is,  however,  brief  due  to  spaoe 

j 

I 

limitations  and  references  are  provided  for  further  details  whenever 

l 

, i 

necessary. 

• ‘tf~  ■ t • . <*.•»  • , ,i  . ‘ 1 U,  f - - * • . 

■4 

> v- J n-  • ' • .«  "■  * • • h«o  '.*>■ 

__  _ 1 
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i 

Model  identification  consists  of  studying  the  characteristic 
behavior  of  the  first  and  the  second  order  statistics  of  the  data.  The 
goal  of  this  step  is  to  identify  a model  structure  or  a class  of  models 
suitable  for  the  data.  Notice  that  this  does  not  Include  finding  an 
exact  model  equation;  that  is  part  of  the  next  step  on  parameter 
estimation.  The  statistics  used  for  model  Identification  in  this 
analysis  are  data  plots,  autocorrelations,  partial  autocorrelations, 
inverse  autocorrelations,  and  Chi  square  test.  The  inferences  drawn 
from  these  statistics  are  now  described. 

ikLfl.  fiaL  : 

The  very  first  step  in  any  identification  procedure  must  be  to  plot  the 
data  and  to  study  its  general  time  behavior.  The  plots  of  CPU  demands 
of  some  of  the  programs  analyzed  are  shown  in  Figure  2.6  . These  are 
typical  of  all  the  programs  analyzed.  Very  often  a program  has  Just  one 
or  two  large  CPU  bursts  whloh  if  plotted  would  obscure  the  details  at 
lower  values.  Therefore,  the  Y-axis  scales  have  been  so  ohosen  that  at 
least  95%  of  the  data  are  shown  in  the  graph.  Very  large  value  are 
shown  out  off  at  the  largest  plottable  value.  Notice  the  following 
characteristic  behavior  of  these  graphs: 

A,  No  Trend  : A trend  (monotonous  inorease  or  deorease)  in  the  data  is 
an  indication  of  non-st at ionarlty,  though  its  absenoe  does  not  confirm 
stationarlty.  For  a stationary  series,  the  mean  of  the  data  does  not 
depend  upon  time;  it  is  constant.  Therefore,  suoh  a series  takes  trips 
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Figure  2.6a  : Data  plots  of  CPU  demand  processes 
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away  from  the  mean,  but  it  returns  repeatedly  during  its  history. 
Fortunately,  none  of  the  CPU  demand  processes  show  a trend.  Thus  we  can 
hope  for  stationarity.  A more  conclusive  test  of  stationarity  via  the 
autocorrelation  function  will  be  described  in  the  next  section. 
g.  Violent  Variations  : Notice  that  the  series  does  not  stay  at  any 
one  level  even  for  short  intervals.  This  Indicates  that  a WS-type 
prediction  scheme  (2(t+1 )*z(t ) , i.e.,  the  ourrent  CPU  burst  size  is  a 
good  estimate  of  the  next  one)  is  probably  not  very  valid.  We  may  have 
to  use  some  more  sophisticated  scheme. 

4«t«9rn>liUon  EuoaUQa  * 

As  the  name  implies,  the  autocorrelation  function  is  a measure  of  the 
correlation  between  the  present  and  the  past  observations.  It  is 
therefore,  also  a measure  of  the  predictability  of  future  from  the  past. 
Mathematically,  the  autocorrelation  function  is  the  normalized 

i 

autocovariance  function.  The  latter  is  defined  as  follows: 

Cov(k)  • E[(z(t)-Z)(z(t+k)-Z)] 

By  dividing  the  autooovar lance  function  by  the  variance  (Cov(O))  we  get 
the  autocorrelation  function  C(k): 

C(k)  ■ Cov(k)/Cov(0) 

Obviously,  to  be  of  any  value,  a stochastic  process  should  have 
finite  memory,  i.e.,  the  present  observation  must  be  correlated  only 
with  those  in  the  finite  past.  In  other  words,  the  autocorrelation 
fund  Ion  should  die  down  to  zero  at  very  large  lags.  Such  processes  are 
oalled  stationary  beoause  after  a while  they  achieve  "equilibrium”  and 
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their  behavior  does  not  depend  upon  initial  conditions  (long  past). 

Autocorrelation  functions  (ACF)  of  some  of  the  CPU  demand 
processes  are  shown  in  Figure  2.7  . These  are  typical  of  all  the 
programs  analyzed.  The  dashed  lines  indicate  the  95 t confidence 
interval  of  the  ACF  for  the  given  sample.  The  expression  given  above 
for  C(k)  is  valid  only  for  infinite  sample  sizes.  For  finite  sample 
sizes  the  calculated  values  are  only  an  approximation  to  the  theoretical 
ACF.  Thus  if  r(k)  denotes  the  standard  deviation  of  C(k)f  then  a 
calculated  value  for  theoretically  zero  autocorrelation  (C(k)sO)  may  lie 
anywhere  between  0^1.98r(k)  with  95%  probability.  In  simple  words,  any 
value  between  the  dotted  lines  can  be  effectively  assumed  to  be  zero 
with  95)6  confidence.  The  variance  r(k)  can  be  calculated  by  Bartlett's 
formula  [Bar46].  in  computer  science  literature,  this  significance  test 
is  almost  always  omitted,  resulting  in  misleading  conclusions. 

The  characteristic  features  of  the  ACF  and  the  inference  that  we 
oan  draw  are  now  described. 

A*.  Ib£  ACF  dlftfl.  down  £&  zero  very  quickly.  This  indicates  that  the  CPU 
demand  process  is  stationary.  If  the  ACF  had  not  died  down  quickly,  we 
would  have  had  to  analyze  the  ACF  of  the  first  and  higher  differences  of 
the  process. 

JL.  The  ACF  ig  non-zero  only  for  1.  or  g lass.  We  can,  therefore, 
restriot  our  consideration  to  MA  models  of  order  less  than  2. 
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It  is  very  important  to  remember  that  the  sample  autocorrelations 
are  only  estimates  of  the  actual  autocorrelations  for  the  process  which 
generated  the  data  at  hand.  Therefore,  the  analyst  must  be  on  the  look 
out  for  general  characteristics  which  are  recognizable  in  the  sample 
correlograa  and  not  automatically  attaoh  significance  to  every  detail. 
For  example,  there  is  a 5%  probability  that  a theoretically  zero 
correlation  will  show  up  as  significant  (above  the  dashed  lines). 
Therefore,  one  or  two  significant  correlations  at  large  lags  in  some  of 
the  cases  shown  should  not  alarm  us. 

< 

£*  TJa  in  positive.  A positive  correlation  between  successive 
values  indicates  that  a large  CPU  demand  in  one  cycle  implies  a large 
demand  in  the  next  cycle.  Therefore,  a program  that  took  a long  CPU 
time  during  last  oycle  can  be  expected  to  be  CPU  bound  at  least  for  the 
next  cycle  and  put  on  a lower  priority  queue. 

JL  Hit  v*lu«  af  ACE,  it  rather  small.  The  ACF  at  lower  lag  values  even 
though  non-zero  and  positive  is  really  very  small  (of  the  order  of  0.1). 
This  partially  dulls  the  hope  expressed  in  the  last  inference.  The 
correlation  being  small,  the  gain  in  the  predictability  of  the  future 
from  the  past  will  be  small.  In  control  theoretio  terms,  we  are, 
perhaps,  headed  for  a zeroth  order  model. 

2.5. 1.3  Partial  AUlMQrrtUllQB  FunflUflfl  : 

The  PACF  is  the  dual  of  the  ACP.  Like  the  ACF  gives  an  idea  of  the 
order  of  the  HA  models,  PACF  gives  an  idea  of  the  order  of  All  models. 
If  the  process  is  modeled  by  an  All  model  of  order  p: 
z(t)  s w ♦ alX(t-1)  ♦ a2s(t-2)  ♦ ...  ♦ apz(t-p)  ♦ e(t) 
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Then  theOcoefflcient  ap  0f  the  last  AR  tern  z(t-p)  is  defined  as  the 
value  of  PACF  at  lag  p.  Naturally,  If  the  real  process  generating  the 
data  had  an  AR  model  of  order  n,  then  we  would  expect  PACP  to  be  zero  at 
all  lags  greater  than  n.  Thus  the  cut-off  point  of  the  PACF  gives  the 
order  of  AR  model. 

The  PACFs  of  some  of  the  CPU  demand  processes  are  shown  In 
Figure  2.8  . The  dashed  lines  indicate  the  95 t confidence  Interval  for 
the  PACF  for  given  sample  sizes.  It  was  shown  by  Quenoullle  [QueAg] 
that  the  approximate  standard  error  of  the  PACF  is  n*°*5.  The 
characteristic  attributes  of  these  PACFs  and  their  implications  are  now 
described. 

A,  The  PACP  dies  down  \p  zero  very  quickly.  In  fact  in  most  oases  the 
PACF  is  significant  (above  the  dashed  lines)  only  for  lags  1 or  2.  This 
means  that  we  do  not  have  to  bother  about  very  high  order  AR  models  to 
model  these  prooesses.  A first  or  seoond  order  model  will  do. 

B.  His.  PACF  positive  at  low  lama.  Notice  that  the  PACF  for  almost 
all  processes  is  positive  at  lag  1.  Only  in  1 or  2 oases  is  PACF(1) 
negative.  The  positive  value  implies  that  a CPU  bursts  gives  a positive 
contribution  to  the  estimate  of  the  next  burst.  It  therefore  oonflrms 
our  previous  conclusion  that  a large  CPU  burst  is  more  likely  to  be 
followed  by  another  large  burst. 

2.8. 1 .4  Chi  Square  Test  of  Randomness  : 

One  way  of  viewing  the  process  of  modeling  a time  series  is  as  an 
attempt  to  find  a transformation  that  reduces  the  observed  data  to 


PARTIAL  AUTOCORRELATION  FUNCTION  PARTIAL  AUTOCORRELATION  FUNCTION 

1.0  -0.0  -0.4  -0.4  -0.2  0.0  0.2  0.4  0.4  O.t  1.0  -1.0  -0.0  -0.4  -0.4  -0.2  0.0  0.2  0.4  0.4  0.0  1.0 
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random  noise.  The  first  question,  therefore,  is  whether  the  data  itself 
is  a random  noise.  Theoretically,  the  autocorrelation  of  random  noise 
will  be  zero  at  all  lags.  In  practice,  it  will  have  small  non-zero 
values.  Bartlett's  formula  for  the  standard  error  of  the  ACF  provides 
some  guidance  to  test  the  smallness.  A better  quantitative  test  of 
randomness  is  due  to  Box  and  Pierce  [BoP70].  They  have  suggested  a 
statistic  that  offers  a test  of  the  smallness  of  a whole  set  of  sample 
autocorrelations  for  lags  1 through  k.  This  is  the  Q statistic  given  by 
Q • N jh  C(j)2 

Q is  approximately  Chi-square  distributed  with  k degrees  of  freedom. 
Using  the  Q statistic  one  can  calculate  the  probability  that  the  given 
sample  came  from  a white  noise  process.  This  probability  is  listed  in 
the  Table  2.2  under  P-VALUB.  Notice  that  22  of  the  33  processes 
analyzed  have  non-zero  P-VALUE,  16  have  P-VALUE  greater  than  101,  and  8 
have  P-VALUE  greater  than  50*.  Of  the  8 user  processes  analyzed  4 have 
a P-VALUE  of  1,  i.e.,  they  are  almost  surely  random  noises.  The  high 
randomness  of  the  user  processes  is  a result  of  their  being  mixtures  of 
several  program  traces,  many  of  which  have  no  relation  to  one  another. 

2. 5. 1-5  Inverse  Aut oaorrelat ion  fUMtlQB  : 

The  inverse  autosorrelatlons  of  a time  series  are  defined  to  be  the 
autocorrelations  associated  with  the  inverse  of  the  spectral  density  of 
the  series,  i.e., 

IACP  • Inv.  Fourier  Transform  -~ ------ tt~-i 

Fourier  Transforms ACF ) J 

The  IACFs  were  first  proposed  by  Cleveland  [Cle72].  He  claims  that  they 


t 
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are  useful  In  Identifying  non-zero  coefficients  in  an  ARMA  model. 
However,  their  utility  in  model  identification  is  still  a point  of 
debate  among  statisticians  [Par72].  We  calculated  the  inverse 
autocorrelation  functions  for  all  of  our  CPU  demand  data  processes. 
However,  in  most  cases  these  functions  did  not  give  much  additional 
Information.  Only  in  some  (2  or  3)  cases,  where  the  processes  behaved 
abnormally  (a  low  order  ARMA  model  was  not  adequate),  did  we  gain  some 
insight  into  modeling  these  particular  cases. 

In  order  to  illustrate  the  use  of  IACF,  let  us  consider  one  such 
case  : the  CPU  demand  behavior  of  program  FRCD0.C11  . Its  A«?F  anr*  PACF 
were  insignificant  everywhere  except  at  lags  5,  6,  and  14.  Obviously,  a 
low  order  ARMA  model  would  not  work  for  this  process.  As  we  will  see  in 
the  next  section  of  model  fitting,  that  an  AR(2)  model  resulted  in  only 
1.6%  Improvement  over  a zeroth  order  model.  The  inverse  autocorrelation 
for  this  process  (assuming  orders  of  1 through  8 for  the  AR  part  of  the 
model)  are  shown  in  Table  2.3  . Notice  that  all  columns  except  5 and  6 
are  zero.  Cleveland  suggests  that  this  indicates  an  appropriate  model 
would  have  a 6th  order  AR  part  with  only  5th  and  6th  coefficients 
non-zero  and  all  other  coefficient  zero,  i.e.,  a model  of  the  following 
type  : 

z(t)  • w ♦ «5*(t-5)  ♦ agz(t-6)  ♦ e(t) 

Obviously,  these  high  order  models  are  of  no  interest  to  us 
beoause  of  their  applicability  only  in  rare  oases,  and  also  because  of 
the  rather  small  gain  even  in  these  cases. 
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Table  2.3  : Inverse  Autocorrelations  of  FRCD0.C11 


m 

ri(1) 

ri(2) 

ri(3) 

ri(4) 

ri<5) 

ri(6) 

ri(7)  ri(8) 

1 

-0.076 

2 

-0.091 

0.100 

3 

-0.074 

0.087 

0.080 

4 

-0.072 

0.090 

0.077 

0.014 

5 

-0.078 

0.066 

0.046 

0.038 

-0.162 

6 

-0.011 

0.062 

0.026 

0.001 

-0.135 

-0.189 

7 

-0.018 

0.056 

0.027 

0.003 

-0.132 

-0.190 

0.016 

8 

-0.019 

0.095 

0.052 

-0.003 

-0.140 

-0.205 

0.026  -0.107 

ri(n)  « nth  inverse  autocorrelation 
■ * Order  of  the  AR  model  used  for  calculating  ri. 
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1x1*2.  P.araacUr  SaUaaUaa  = 

In  order  to  find  one  general  model  for  all  CPU  demand  processes  we 
fitted  several  models  to  each  process,  and  found  the  best  parameter 
estimates  and  hence  the  maximum  improvement  available.  The  details  of 
the  model  fitting  procedure  and  the  results  obtained  are  the  topic  of 
this  section. 

The  net  conclusion  of  the  identification  step  discussed  in  the 
last  section  are  the  following  : 

1.  The  CPU  demand  process  is  a stationary  process. 

2.  The  order  of  the  ARMA  model  required  to  model  the  process  is 
rather  small  - of  the  order  of  1 or  2. 


We, 

therefore 

, limited  our 

search  for  the  best  model  to  the  clars 

of  ARMA(p,q)  models 

with  p+q  £ 2. 

This  class  Includes  the  following  six 

models. 

a*  usu 

ILiA 

Mo^el  Tvoe 

Model  equation 

i. 

0 0 

White  Noise 

z(t)«w+e(t) 

2. 

0 1 

MAO) 

z(t)«w+e(t)-bia(t-l) 

3. 

0 2 

MA(2) 

z(t)»w+e(t)-b1#(t-i )-b2e(t-2) 

4. 

1 0 

ARO) 

z(t)*w+SjZ(t.i )+e(t ) 

5. 

1 1 

ARMAO.I) 

z(t)*w-*.a1z(t-1)+e(t)-b1e(t-1) 

6. 

2 0 

AR(2) 

*(t)«w+alZ(t-1)+a2z(t-2)-fe(t) 
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Let  us  consider  the  general  case  of  fitting  an  ARMA(p,q)  model  to 
a process.  The  model  is 

z(t)-alZ(t_i)_..._ap2(t-p)  * w+e(t)-t>ie(t-1)-...-bqe(t-q) 

The  parameter  estimation  problem  is  to  find  the  "best"  estimate  of 

the  parameters  0 *{w,  a^,...,ap,  b^,...,bq),  and  the  variance  s| 
e(t).  Here  the  best  is  defined  in  the  sense  of  maximum  likelihood  (ML). 
The  likelihood  function  is  the  probability  p(z|0,s£)  that  a given  set  of 
parameter  values  would  have  given  rise  to  the  observed  data.  If  the 
noise  e(t)  is  assumed  to  be  normal  then  it  can  be  shown  that  the  ML 
estimates  are  obtained  by  maximizing  the  sum-of-square  function  [Nel72, 
P94]: 

SSR(0)  , .*(•) 

t»1 

Once  MLE  of  0 has  been  obtained,  MLE  of  s^  is  just 
32  §SR(0) 

• * *‘n‘“ 

The  superscript  * denotes  ML  estimate.  We  illustrate  the  estimation 
procedure  with  a sample  case. 

A Sample  Case  : Figure  2.9  presents  the  output  from  the  program  ESTIMA 
for  the  case  of  fitting  an  ARMA(1,1)  model  to  ECL.S2  process.  The  first 
portion  of  the  output  describes  the  problem,  i.e.,  number  of 
observations,  order  of  differencing,  initial  guess  values  for  parameters 
eto.  Then  the  iterations  towards  ML  estimate  begin.  The  Gauss-Newton 
method  is  used  to  find  the  optimal.  We  now  desorlbe  the  importance  of 
each  of  the  results  shown  in  Figure  2.9  • 
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CPU  DEMAND  BEHAVIOR  OF  ECL  (S-2) 

NOBS  s 270 
INITIAL  VALUES 

AR(  1)  0. 1000E+00 

MA(  1)  -0. 1000E+00 

CONST  0.7500E+02 

MODEL  WITH  D : 0 DS  : 0 S : 0 

MEAN  = 81.67  SD  = 528.9  (NOBS  = 270> 

INIT  SSR  = 0.7540E+08 


ITER 

SSR 

ESTIMATES 

1 

2 

3 

1 

0.747 3 E+08 

3.905E-02 

-7. 193E-02 

78.0 

2 

0.7472E+08 

-1.716E-02 

-0.122 

82.9 

3 

0.7471E+08 

-7.497E-02 

-0.180 

87.7 

4 

0.7471E+08 

-5.205E-02 

-0.157 

85.9 

5 

0.7471E+08 

-6.595E-02 

-0.171 

87.0 

REL.  CHANGE  IN  SSR  <s  0.1000E-05 
FINAL  SSR  = 0.7U71E+08 

5 ITERATIONS 

CPU  DEMAND  BEHAVIOR  OF  ECL  (S-2) 

PARAMETER  ESTIMATES 

mins: 

EST  SE  EST/SE  95*  CONF  LIMITS 

AR(  1)  -0.066  0.569  -0.116  -1.181  1.049 

MA(  1)  -0.171  0.563  -0.304  -1.273  0.932 

CONST  87.001  59.532  1.461  -29.682  203.684 

EST. RES. SD  * 5.2899E+02 

EST. RES. SD( WITH  BACK  FORECAST)  s 5.2899E+02 

R SQR  0.011 

ADJ  R SQR  0.004 

D.F.  s 267 

F 3 1.474  (2,267  DF)  P-VALUE  3 0.231 

CORRELATION  MATRIX 

Afl(  1)  MA(  1) 

MA(  1)  0.994 

CON(  3)  -0.780  -0.776 


(CONTINUED...) 
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CPU  DEMAND  BEHAVIOR  OF  ECL  (S-2) 

AUTOCORRELATIONS  OF  RESIDUALS 
LAGS  ROW  SE 

1 -8  .06  -0.00  -0.01  -0.02  0.04  -0.02  -0.01  -0.01  -0.01 

9-16  .06  -0.02  -0.01  -0.01  -0.01  -0.01  -0.01  -0.02  -0.01 


CHI-SQUARE  TEST 

P-VALUE 

Q(  8)  x 

.679  6 

D.F. 

0.995 

Q(  16)  = 

1.06  14 

D.F. 

1.000 

CROSS  CORRELATIONS  OF 

RESIDUALS  AND  THE 

SERIES 

ZERO  LAG 

* 0.99 

LAGS 

E(T) , Z(T+K) 

1 -8 

0.10  -0.01 

-0.02 

0.03  -0.01 

-0.01  -0.01 

-0.01 

9-16 

-0.02  -0.01 

-0.01 

-0.01  -0.01 

-0.01  -0.02 

-0.02 

CHI-SQUARE  TEST 

P-VALUE 

Q(  8)  * 

3.59  6 

D.F. 

0.732 

Q( 16)  = 

4.03  14 

D.F. 

0.995 

LAGS 

1 -8  -0.00 

-0.01 

-0.02 

E(T+K) ,Z(T) 

0.03  -0.02  -0.01  -0.01  -0.02 

9-16 

-0.02 

-0.01 

-0.01 

-0.01  -0.01  -0.01  -0.02  -0.02 

CHI-SQUARE  TEST 
Q(  8)  « .649 

6 

D.F. 

P-VALUE 

0.996 

Q( 16)  * 

1.10 

14 

D.F. 

1.000 
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Figure  2.9 


Output  of  the  Parameter  Estimation  Program 
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A*.  ?t9PPlM  CfU prion-  Regardless  of  the  optimization  technique  used 
one  has  to  decide  when  to  stop  iteration.  Various  stopping  criteria  and 
Justifications  for  their  use  have  been  discussed  by  Muralidharan  and 
Jain  [Mu J75 ] . The  ESTIMA  program  stops  whenever  any  of  the  following 
criteria  are  satisfied: 

1.  Relative  change  in  SSR  is  less  than  10-6. 

2.  Absolute  change  in  SSR  is  less  than  10-6. 

3*  The  step  size  is  less  than  10-6. 

4.  Number  of  Iteration  reaehes  a limit  of  30  (a  bad  likelihood 
function). 


In  almost  all  cases  of  CPU  demand  modeling,  the  optimization 
program  stopped  on  the  first  criterion. 


1L  SanfldPUftt  Interval  : It  is  important  to  remember  that  MLE  of  the 
parameters  are,  after  all,  random  variables  sinoe  they  are  functions  of 
the  data.  It  oan  be  shown  [BoJTO,  p226]  that  MLE  in  large  samples  are 
Joint  normally  distributed  with  mean  value  equal  to  the  true  parameter 
values  and  variance  covariance  matrix  given  by  : 

V(fi)  . 2 sj  q-1 


where  the  (i,J)th  element  of  the  matrix  Q is  given  by 


d2S 


^ f |2f  • • e » P+<J*1 

iawJ 


Taking  the  square  root  of  the  diagonal  elements  of  the  estimated 
variance-covariance  matrix,  we  get  the  estimated  standard  deviation  of 
the  parameter  estimates  or  the  standard  error  denoted  SE(i^) 
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confidence  interval  for  i„  given  by  01±1 .96*SE(§1) . 

Cj.  Utility  2 £ the  Model  : We  test  the  utility  of  the  model  in  terms  of 

and  P-test . R2  the  fraction  of  the  variance  explained  by  the 

model.  It  is  calculated  by  the  following  formula  : 

R2  , 1 - ZKlsUil 
vartz(t)] 

1/M  t.  e2*^ 


* 1 • 


1/n  £ (z(t)-*)2 


This  criterion  for  measuring  the  utility  of  a model  does  not 

penalize  the  model  for  its  use  of  parameters.  Generally  the  addition  of 

any  parameter  to  a model  may  be  expected  to  reduce  SSR  and  s2  since  the 

© 

additional  parameter  offers  one  additional  degree  of  freedom  along  which 
to  reduce  them.  Consequently,  to  penalize  a model  for  its  use  of 
parameter  or  degrees  of  freedom,  one  may  compute  estimates  of  32  by 
dividing  SSR  by  N-k  i.e.,  the  net  remaining  degrees  of  freedom.  This 
corrected  measure  of  improvement  is  called  Adjusted  R? 

1/(N-k)  H ®2(t) 


RLj  ■ 1 - 


1/(M-1 ) £ (z(t)-*)2 


, N-1  - k-1 


A negative  or  very  low  value  of  R2dj  indicates  that  the  model  is  really 
not  worth  the  trouble. 
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The  utility  of  a model  can  also  be  assessed  by  the  F-test.  This 
test  compares  two  hypotheses  : 

HI  s Q«0 
HO  : 0x0 

* < 

It  can  be  shown  that  the  likelihood  ratio  F given  by 

F a 

PU/HO)  (i_R2)/([i.k) 

is  F-distributed  with  k and  N-k  degrees  of  Freedom  [BoJ70  p266].  The 
probability  that  a F-distributed  variable  has  the  calculated  F value  is 
given  in  the  output  as  P-VALUE.  A large  P-VALUE  Implies  that  the 
parameter  values  are  significantly  different  from  zero.  For  example, 
the  ARMA (1,1)  model  for  ECL.S2  has  a P-VALUE  of  0.231.  This  low  value 
indicates  futility  of  the  ARMA  model  for  this  process. 

B«ck  Forecasting  : The  values  of  e(t)  in  the  expression  for  SSR  are 
calculated  as  follows 

e(t)s*(t)-fc  ajz(t-i)  ♦ fc  bi«Ct-i)  -w 
iml  1*1 

It  is  immediately  appat  jnt  that  there  is  a problem  here  because  we  have 
no  value  for  e_q+1  and  z(0),...,z(-p).  One  solution  is  to  assume 

e(1)  through  e(q)  as  zero  and  start  using  the  above  equation  from  t«q+1. 
An  alternative  solution  to  this  starting  value  problem  is  a back 
forecasting  procedure  suggested  by  Box  and  Jenkins  [BoJ70,  p212].  The 
ESTIMA  output  gives  the  standard  deviation  of  the  residuals  with  and 
without  the  back  forecasting. 


T 


T 
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JL  Correlation  Matrix  : Also  given  in  the  ESTIMA  output  is  the 
estimated  correlation  matrix  of  the  parameter  estimates.  The 
correlation  between  two  parameters  is  obtained  by  taking  their  estimated 
covariance  form  the  variance-covariance  matrix  described  before  and 
dividing  it  by  the  product  of  their  standard  errors.  In  this  example, 
the  correlation  between  a1  la  estimated  to  be  0.994.  This  high 
correlation  indicates  that  one  of  the  two  parameters  is  highly  dependent 
on  the  other  and  therefore  one  of  them  can  be  omitted  from  the  model  and 
the  model  order  reduced  without  seriously  affecting  the  performance. 

J Ik  PiMHWtfUQ  Checks  SD.  ills.  Residuals  : There  are  two  kinds  of  tests 
that  can  be  applied  to  residuals  to  test  the  adequacy  of  a model.  These 
are  the  whiteness  test  and  the  cross-correlation  test.  If  the  model  is 
adequate,  we  would  expect  to  find  that  the  residuals  3(t)  have  the 
property  of  random  noise  - in  particular,  that  they  are  not  serially 
correlated.  Autocorrelation,  if  evident  in  the  residuals,  may  help  to 
suggest  the  direction  in  which  the  model  should  be  modified.  To  test 
whiteness,  we  use  the  Q-statistic  discussed  before.  In  the  example 
shown,  the  Q-statistic  is  1.06  which  corresponds  to  a probability 
(P-value)  of  1.000.  This  high  p- value  confirms  the  uncorrelatedness  of 
residuals. 

The  cross-correlation  test  is  based  on  the  correlation  between  the 
residuals  and  the  process.  An  important  property  of  the  theoretical 
disturbances  is  that  they  are  correlated  with  the  present  and  future 
values  of  z,  but  not  with  the  past  values,  i.e., 
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Cov[z(t-k),e(t)]  s 0 k>0 
Cov[z(t+k) ,e(t)]  i 0 klO 

As  an  additional  check  on  the  model,  corresponding 


sample 


cross-correlations  between  residuals  and  the  process  are  displayed  in 
ESTIMA  output. 
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2*1*1  QaanalsuL  a.  QsqscU  fedal; 

The  results  of  parameter  estimation  for  5 different  models  of  41 
different  CPU  demand  processes  analyzed  are  listed  in  Tables  2.4  through 
2.8  . In  these  tables,  w,  a^f  a2,  bl>  and  b2f  if  present,  are  model  l 

parameters;  s#  ia  the  standard  deviation  of  the  residuals.  R^»  R|dj, 

P-value  for  F-test , and  P-value  for  Chi-square  test  are  as  explained 
previously  in  sections  2.5.2.C  and  2. 5. 1.4  . 

\ I 

Our  next  task  is  to  choose  the  model  which  best  represents  CPU 
demand  behavior  of  programs.  There  are  many  ways  of  defining  the 
"best".  For  example,  one  criterion  that  we  first  explored  was  the 
following  : 

For  each  process  find  the  best  model  (the  one  giving  the 

highest  R2,  or  Radj),  and  choose  the  model  that  is  beat  for  a 
majority  of  the  processes. 

We  rejected  this  criterion  on  the  grounds  that  it  does  not  reflect 
the  fact  that  for  programs  with  large  variances  even  a small  R2  is  good, 
whereas  for  programs  with  low  variances  even  a large  R2  is  not  much  use.  « 

Thus  the  net  reduction  in  SSR  rather  than  R*  should  be  the  criterion  for 
selection.  We,  therefore,  decided  to  use  the  following  criterion  : 

For  each  type  of  model,  find  the  total  (sum  of)  reduction  in 
SSR  achieved  by  the  model  for  all  programs,  and  choose  the 
model  that  gives  the  highest  reduction. 


i 
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Table  2.4  : Parameter  Estimation  for  ARMA(1,1)  Model 
*(t)  » w ♦ aiz(t_i)  + e(t)  . bie(t-1) 


Process 

Name 

w 

a1 

b1 

8e 

R2 

R2 

"adj 

P-VAL 

F-test 

P-VAL 

Chi-sq 

COMP2.G63 

7.00 

0.71 

0.38 

77.3 

0.185 

0.175 

0.000 

0.190 

ECL.B1 

8.98 

0.19 

-0.17 

18.1 

0.118 

0.095 

0.007 

0.884 

ECL.B2 

20.89 

0.72 

0.53 

368.7 

0.066 

0.058 

0.000 

0.969 

ECL.S1 

27.55 

0.44 

0.44 

309.6 

0.000 

0.004 

0.000 

0.990 

ECL.S2 

83.68 

-0.08 

-0.19 

527.4 

0.011 

0.004 

0.214 

0.000 

F0RTR.P21 

0.74 

0.86 

0.75 

7.7 

0.042 

0.025 

0.094 

0.377 

F0RTR.P30 

0.76 

0.88 

0.80 

7.6 

0.025 

0.017 

0.053 

0.925 

F0RTR.P8 

0.10 

0.98 

0.93 

6.7 

0.044 

0.038 

0.000 

0.049 

F0RTR.Q17 

0.82 

0.84 

0.69 

6.0 

0.068 

0.061 

0.000 

0.364 

FRCD0.C1 

394.10 

0.36 

0.70 

1232.1 

0.112 

0.100 

0.000 

0.364 

FRCD0.C11 

243.75 

-0.32 

-0.42 

579.7 

0.010 

0.003 

0.470 

0.001 

M786S.U1 

1.60 

-0.01 

-0.39 

5.4 

0.127 

0.123 

0.000 

0.000 

MAIN. 010 

1.18 

0.85 

0.55 

6.5 

0.244 

0.237 

0.000 

0.955 

MAIN.Q19 

1.81 

0.77 

0.42 

5.3 

0.207 

0.190 

0.000 

0.000 

MAIN.R55 

1.41 

0.59 

0.41 

3.4 

0.043 

0.028 

0.064 

0.591 

P.A19 

2.70 

0.70 

0.47 

22.5 

0.093 

0.085 

0.000 

0.000 

PIP.G18 

0.53 

0.56 

0.35 

0.7 

0.049 

0.035 

0.031 

0.409 

PIP.G45 

-0.04 

1.01 

0.33 

0.6 

0.500 

0.488 

0.000 

0.552 

PIP.G60 

-0.02 

0.10 

0.82 

0.3 

0.292 

0.285 

0.000 

0.910 

S0S.A21 

0.03 

0.98 

0.92 

2.6 

0.077 

0.072 

0.000 

0.000 

SOS. A 22 

0.26 

0.87 

0.73 

2.7 

0.064 

0.041 

0.067 

0.997 

SOS.A23 

0.41 

0.74 

0.83 

1.4 

0.015 

0.005 

0.476 

0.370 

S0S.A6 

-0.04 

1.01 

0.96 

3.1 

0.029 

0.011 

0.210 

0.557 

TEC0.B8 

6.16 

-0.66 

-0.53 

6.6 

0.031 

0.008 

0.258 

0.966 

TEC0.F1 

0.24 

0.10 

0.98 

4.6 

0.023 

0.001 

0.348 

0.594 

TEC0.F20 

0.82 

0.86 

0.73 

6.2 

0.056 

0.047 

0.003 

0.731 

TECO.G37 

1.74 

0.94 

0.91 

123.0 

0.005 

0.013 

0.763 

0.268 

TECO.G38 

2.47 

0.85 

0.78 

64.6 

0.019 

0.010 

0.124 

0.221 

TECO.G55 

0.69 

0.84 

0.59 

4.2 

0.160 

0.145 

0.000 

0.960 

TEC0.H1 

0.56 

0.74 

0.52 

2.7 

0.100 

0.089 

0.000 

0.706 

TEC0.J5 

0.82 

0.87 

0.72 

6.3 

0.069 

0.048 

0.044 

0.649 

TEC0.P1 

4.45 

0.05 

-0.10 

12.8 

0.021 

0.006 

0.240 

0.000 

TEC0.P13 

2.57 

0.41 

0.25 

5.6 

0.027 

0.003 

0.325 

0.967 

USER.B 

9.12 

0.75 

0.55 

247.2 

0.082 

0.079 

0.000 

0.779 

USER . D 

45.22 

0.13 

0.10 

330.3 

0.001 

0.007 

0.864 

0.000 

USER.F 

0.82 

0.85 

0.84 

17.6 

0.001 

0.002 

0.743 

0.000 

USER . H 

7.01 

-0.07 

-0.13 

40.0 

0.003 

0.002 

0.548 

0.000 

USER.L 

1.44 

0.66 

0.59 

7.8 

0.009 

0.003 

0.195 

0.001 

USER . N 

23.87 

0.21 

0.13 

187.3 

0.006 

0.002 

0.222 

0.000 

USER . P 

1.64 

0.77 

0.63 

17.1 

0.048 

0.047 

0.000 

0.096 

USER.T 

12.11 

0.52 

0.39 

111.3 

0.022 

0.014 

0.059 

0.852 

CPU  Management 
Data  Analysis 


Page  2-47 


Table  2.5  : Parameter  Estimation  for  AR(1)  Model 
*(t)  > w ♦ alS(t.l)  ♦ e( t ) 


C0MP2.G63 

14.39 

0.41 

77*6 

0.172 

0.167 

0.000 

0 

ECL.B1 

7.33 

0.34 

18.0 

0.113 

0.102 

0.002 

0 

ECL.B2 

60.56 

0.19 

373.8 

0.036 

0.032 

0.002 

0 

ECL.S1 

49.31 

-0.01 

309.3 

0.000 

0.002 

0.893 

0 

ECL.S2 

69.39 

0.11 

526.6 

0.011 

0.007 

0.085 

0 

F0RTR.P21 

5.19 

0.05 

7.9 

0.003 

0.006 

0.572 

0 

F0RTR.P30 

6.07 

0.05 

7.7 

0.002 

0.002 

0.485 

0 

F0RTR.P8 

5.57 

0.10 

6.8 

0.010 

0.007 

0.058 

0 

P0RTR.Q17 

4.44 

0.10 

6.1 

0.024 

0.020 

0.014 

0 

FRCDO.C1 

727.58 

0.10 

1277.3 

0.039 

0.032 

0.019 

0 

FRCD0.C1 1 

169.95 

0.08 

579.0 

0.006 

0.001 

0.339 

0 

M786S.U1 

1.06 

0.33 

5.5 

0.110 

0.108 

0.000 

0 

MA1N.Q10 

4.75 

0.10 

6.7 

0.184 

0.180 

0.000 

0 

MA1N.Q19 

4.79 

0.41 

5.5 

0.161 

0.152 

0.000 

0 

MAIN.R55 

2.81 

0.10 

?.4 

0.031 

0.023 

0.046 

0 

P.A19 

6.99 

0.24 

22.9 

0.057 

0.052 

0.000 

0 

P1P.G18 

0.94 

0.22 

0.7 

0.041 

0.034 

0.017 

0 

PIP.G45 

0.17 

0.87 

0.7 

0.441 

0.434 

0.000 

0 

PIP.G60 

0.49 

0.49 

0.4 

0.164 

0.160 

0.000 

0 

S0S.A21 

1.92 

0.02 

2.7 

0.001 

0.002 

0.622 

0 

S0S.A22 

1.68 

0.18 

2.7 

0.031 

0.019 

0.107 

0 

SOS.A23 

1.57 

-0.03 

1.4 

0.001 

0.009 

0.766 

0 

S0S.A6 

2.66 

0.06 

3.1 

0.004 

0.006 

0.538 

0 

TEC0.B8 

4.17 

0.10 

6.6 

0.015 

0.004 

0.246 

0 

TEC0.F1 

2.93 

-0.08 

4.7 

0.007 

0.004 

0.440 

0 

TEC0.F20 

4.92 

0.14 

6.3 

0.021 

0.016 

0.041 

0 

TECO.G37 

27.70 

0.01 

122.7 

0.000 

0.009 

0.896 

0 

TECO.G38 

16.89 

0.02 

65.0 

0.000 

0.004 

0.753 

0 

TECO.G55 

2.87 

0.34 

4.3 

0.113 

0.105 

0.000 

0 

TEC0.H1 

1.60 

0.28 

2.7 

0.079 

0.074 

0.000 

0 

TEC0.J5 

4.84 

0.10 

6.3 

0.059 

0.048 

0.021 

0 

TEC0.P1 

4.01 

0.14 

12.8 

0.021 

0.013 

0.092 

0 

TEC0.P13 

3.67 

0.15 

5.6 

0.023 

0.011 

0.170 

0 

USER.B- 

29.04 

0.21 

251.9 

0.046 

0.044 

0.000 

0 

USER . 0 

50.41 

0.04 

329.7 

0.001 

0.003 

0.586 

0 

USER.F 

- 5.54 

0.01 

17.5 

0.000 

0.001 

0.782 

0 

USER . H 

6.18 

0.05 

39.9 

0.003 

0.000 

0.276 

0 

USER.L 

4.11 

0.04 

7.8 

0.002 

0.001 

0.433 

0 

USER.N 

27.82 

0.08 

187.1 

0.006 

0.004 

0.085 

0 

USER . P 

5.82 

0.18 

17.2 

0.033 

0.032 

0.000 

0 

USER.T 

21.97 

0.12 

111.4 

0.015 

0.012 

0.044 

0 
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Table  2.6 

Parameter  Estimation  for  MA(1)  Model 

z(t)  * w ♦ e(t ) - OjeU-l) 

] 

Process 

w 

*>1 

s. 

R2 

o2 

nad  1 

P-VALUE 

P-VALUE 

Name 

aaj 

F-test 

Chi-sq 

‘ 

COMP2.G63 

24.76 

-0.31 

79.7 

0.127 

0.121 

0.000 

0.000 

J 

ECL.B1 

11.13 

-0.32 

18.0 

0.112 

0.101 

0.002 

0.921  1 

ECL.B2 

74.70 

-0.14 

375.6 

0.026 

0.023 

0.009 

0.270 

ECL.S1 

49.00 

0.01 

309.3 

0.000 

0.002 

0.891 

0.995 

ECL.S2 

77.55 

-0.11 

526.5 

0.011 

0.008 

0.080 

0.000 

FQRTR.P21 

5.49 

-0.05 

7.9 

0.002 

0.007 

0.605 

0.283 

PORTR.P30 

6.36 

-0.04 

7.7 

0.002 

0.003 

0.522 

0.498 

F0RTR.P8 

6.21 

-0.09 

6.8 

0.009 

0.006 

0.079 

0.002 

\ ! 

FORTH. Q17 

5.26 

-0.11 

6.1 

0.017 

0.013 

0.039 

0.012 

FRCDO.C1 

610.29 

0.40 

1253.3 

0.075 

0.068 

0.001 

0.070 

' 

FRCDO.C11 

184.04 

-0.09 

578.6 

0.007 

0.001 

0.291 

0.001 

M786S.U1 

1.58 

-0.38 

5.4 

0.127 

0.125 

0.000 

0.000 

MAIN.Q10 

8.39 

-0.34 

6.9 

0.134 

0.130 

0.000 

0.000 

MA1N.Q19 

8.17 

-0.26 

5.7 

0.102 

0.093 

0.001 

0.000 

MAIN.R55 

3.42 

-0.14 

3.4 

0.024 

0.017 

0.078 

0.532 

r 

P.A19 

9.19 

-0.17 

23.1 

0.039 

0.035 

0.003 

0.000 

PIP.G18 

1.19 

-0.17 

0.7 

0.031 

0.024 

0.036 

0.392 

j 

PIP.G45 

1.03 

-0.43 

0.8 

0.213 

0.204 

0.000 

0.054 

; 

PIP.G60 

0.95 

-0.30 

0.4 

0.098 

0.094 

0.000 

0.000 

SOS.A21 

1.96 

-0.02 

2.7 

0.000 

0.002 

0.665 

0.000 

S0S.A22 

2.04 

-0.17 

2.7 

0.029 

0.017 

0.122 

0.959 

; 

SOS.A23 

1.53 

0.04 

1.4 

0.001 

0.009 

0.741 

0.372 

SOS. A 6 

2.83 

-0.06 

3.1 

0.003 

0.006 

0.542 

0.665 

TECO.B8 

3.71 

0.09 

6.6 

0.011 

0.000 

0.316 

0.940 

TEC0.F1 

2.71 

0.10 

4.7 

0.007 

0.004 

0.443 

0.633 

TEC0.F20 

5.76 

-0.12 

6.3 

0.017 

0.012 

0.067 

0.178 

TECO.G37 

28.04 

-0.01- 

122.7 

0.000 

0.009 

0.897 

0.303 

TECO.G38 

17.26 

-0.02 

65.0 

0.000 

0.004 

0.761 

0.148 

* 

TEC0.G55 

4.35 

-0.24 

4.3 

0.077 

0.069 

0.003 

0.210 

V 

TECO.H1 

2.24 

-0.21 

2.8 

0.059 

0.053 

0.002 

0.262 

TEC0.J5 

6.41 

-0.24 

6.3 

0.059 

0.048 

0.021 

0.805 

iJ 

TEC0.P1 

4.69 

-0.14 

12.8 

0.021 

0.014 

0.091 

0.000 

1 

TECO.P13 

4.34 

-0.12 

5.6 

0.018 

0.006 

0.221 

0.911 

USER.B 

36.98 

-0.16 

253.5 

0.034 

0.032 

0.000 

0.000 

USER.D 

52.25 

-0.04 

329.7 

0.001 

0.003 

0.587 

0.000 

USER.F 

5.59 

-0.01 

17.5 

0.000 

0.001 

0.783 

0.000 

USER . H 

6.53 

-0.05 

39.9 

0.003 

0.000 

0.274 

0.000 

USER.L 

4.28 

-0.03 

7.8 

0.001 

0.001 

0.473 

0.001 

USER . N 

30.22 

-0.08 

187.2 

0.006 

0.004 

0.089 

0.000 

USER . P 

7.10 

-0.15 

17.3 

0.027 

0.027 

0.000 

0.000 

USER . T 

25.11 

-0.10 

111.6 

0.013 

0.009 

0.069 

0.754 

f 
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Table  2.7  : Parameter  Estimation  for  AR(2)  Model 
z(t)  * v ♦ alZ(t-i)  + a2z(t-2)  ♦ e(t) 


Process 

Name 

w 

*1 

*2 

8, 

e 

R2 

RadJ 

P-VAL 

F-test 

P-VAL 

Chi-sq 

COMP2.G63 

12.71 

0.37 

0.11 

77 

.4 

0.183 

0.172 

0.000 

0.137 

ECL.B1 

8.00 

0.37 

-0.09 

18 

.0 

0.120 

0.097 

0.007 

0.906 

ECL.B2 

49.42 

0.15 

0.18 

368 

.4 

0.067 

0.060 

0.000 

0.980 

ECL.S1 

49.94 

-0.01 

-0.01 

309 

.6 

0.000 

0.004 

0.956 

0.991 

ECL.S2 

71.08 

0.11 

-0.02 

527 

.4 

0.012 

0.004 

0.211 

0.000 

F0RTR.P21 

4.62 

0.05 

0.11 

7 

.9 

0.014 

0.004 

0.456 

0.214 

FORTR.P30 

5.46 

0.04 

0.10 

7 

.7 

0.012 

0.003 

0.260 

0.766 

F0RTR.P8 

5.09 

0.09 

0.08 

6 

.7 

0.017 

0.012 

0.048 

0.010 

F0RTR.Q17 

3.51 

0.10 

0.21 

6 

.0 

0.065 

0.058 

0.000 

0.325 

FRCD0.C1 

888.04 

0.10 

-0.22 

1251 

.6 

0.084 

0.071 

0.002 

0.105 

FRCD0.C11 

187.83 

0.10 

-0.10 

577 

.8 

0.016 

0.004 

0.281 

0.000 

M786S.U1 

1.19 

0.37 

-0.13 

5 

.4 

0.124 

0.121 

0.000 

0.000 

MAIN.Q10 

3.69 

0.34 

0.21 

6 

.6 

0.220 

0.213 

0.000 

0.582 

MAIN.Q19 

3.32 

0.31 

0.27 

5 

.3 

0.220 

0.203 

0.000 

0.002 

MAIN.R55 

2.43 

0.10 

0.13 

3 

.4 

0.048 

0.033 

0.045 

0.665 

P.A19 

5.35 

0.18 

0.23 

22 

.3 

0.106 

0.098 

0.000 

0.000 

PIP.G18 

0.81 

0.18 

0.14 

0 

.7 

0.056 

0.042 

0.020 

0.428 

PIP.G45 

-0.03 

0.10 

0.38 

0 

.6 

0.52-* 

0.514 

0.000 

0.911 

PIP.G60 

0.30 

0.33 

0.36 

0 

.3 

0.232 

0.225 

0.000 

0.024 

S0S.A21 

1.62 

0.02 

0.15 

2 

.7 

0.023 

0.019 

0.007 

0.000 

SOS. A 22 

1.58 

0.17 

0.06 

2 

.7 

0.034 

0.011 

0.237 

0.964 

SOS.A23 

1.72 

-0.03 

-0.09 

1 

.4 

0.009 

0.011 

0.628 

0.449 

S0S.A6 

2.62 

0.06 

0.01 

3 

.1 

0.004 

0.015 

0.819 

0.595 

TEC0.B8 

3.47 

0.10 

0.16 

6 

.6 

0.041 

0.019 

0.159 

0.994 

TEC0.F1 

2.92 

-0.08 

0.00 

4 

.7 

0.007 

0.016 

0.744 

0.565 

TECO.F20 

4.30 

0.13 

0.13 

6 

.3 

0.036 

0.026 

0.027 

0.397 

TECO.G37 

27.51 

0.01 

0.01 

123 

.2 

0.000 

0.018 

0.989 

0.243 

TECO.G38 

16.21 

0.02 

0.04 

65 

.1 

0.002 

0.007 

0.802 

0.117 

TEC0.055 

2.20 

0.26 

0.23 

4 

.2 

0.157 

0.142 

0.000 

0.965 

TEC0.H1 

1.36 

0.24 

0.15 

2 

.7 

0.100 

0.089 

0.000 

0.751 

TEC0.J5 

4.98 

0.25 

-0.03 

6 

.3 

0.059 

0.038 

0.069 

0.752 

TEC0.P1 

4.08 

0.15 

-0.02 

12 

.8 

0.021 

0.006 

0.239 

0.000 

TEC0.P13 

3.25 

0.14 

0.11 

5 

.6 

0.035 

0.011 

0.241 

0.987 

USER.B 

23.51 

0.17 

0.19 

247 

.6 

0.080 

0.077 

0.000 

0.684 

USER . D 

50.35 

0.04 

0.00 

330 

.3 

0.001 

0.007 

0.862 

0.000 

USER.F 

5.49 

0.01 

0.01 

17 

.6 

0.000 

0.003 

0.944 

0.000 

USER . H 

6.22 

0.05 

-0.01 

40 

.0 

0.003 

0.002 

0.549 

0.000 

USER . L 

3.69 

0.04 

0.10 

7 

.8 

0.012 

0.006 

0.116 

0.001 

USER . N 

27.53 

0.08 

0.01 

187 

.3 

0.006 

0.002 

0.222 

0.000 

USER . P 

5.28 

0.16 

0.09 

17 

.1 

0.041 

0.040 

0.000 

0.003 

USER.T 

19.67 

0.11 

0.10 

111 

.0 

we 

0.026 

0.018 

0.033 

0.937 
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Table  2.8  : Parameter  Estimation  for  MA(2)  Model 
z(t)  = w ♦ e(t)  - bie(t-i)  - b2e(t-2) 


Process 

H 

I 

1 

l>2 

R2 

p2 

P-VAL 

P-VAL 

Name 

£ 

© 

dUJ 

F-tesi 

Chi-sq 

COMP2.G63 

24.75 

-0 

.39 

-0 

.21 

77.8 

0.175 

0.165 

0.000 

0.051 

ECL.B1 

11.09 

-0 

.38 

-0 

.10 

18.0 

0.121 

0.099 

0.006 

0.891 

ECL.B2 

74.57 

-0 

.12 

-0 

.17 

370.8 

0.055 

0.047 

0.001 

0.834 

ECL.S1 

49.00 

0 

.01 

0 

.01 

309.6 

0.000 

0.004 

0.961 

0.991 

ECL.S2 

77.56 

-0 

.11 

0 

.01 

527.4 

0.012 

0.004 

0.213 

0.000 

F0RTR.P21 

5.48 

-0 

.01 

-0 

.07 

7.9 

0.008 

0.010 

0.640 

0.198 

F0RTR.P30 

6.36 

-0 

.03 

-0 

.08 

7.7 

0.009 

0.000 

0.350 

0.682 

F0RTR.P8 

6.20 

-0 

.08 

-0 

.06 

6.8 

0.014 

0.008 

0.090 

0.005 

F0RTR.Q17 

5.25 

-0 

.12 

-0 

.15 

6.0 

0.049 

0.041 

0.002 

0.171 

FRCD0.C1 

615.91 

0 

.31 

0 

.21 

1226.4 

0.121 

0.108 

0.000 

0.429 

FRCD0.C1 1 

184.48 

-0 

.07 

0 

.09 

578.4 

0.014 

0.002 

0.330 

0.001 

M786S.U1 

1.58 

-0 

.38 

0 

.00 

5.4 

0.127 

0.123 

0.000 

0.000 

MAIN.Q10 

8.38 

-0 

.31 

-0 

.19 

6.8 

0.163 

0.155 

0.000 

0.003 

MA1N.Q19 

8.13 

-0 

.30 

-0 

.22 

5.5 

0.170 

0.153 

0.000 

0.000 

MAIN.R55 

3.41 

-0 

.14 

-0 

.16 

3.4 

0.047 

0.032 

0.047 

0.702 

P.A19 

9.16 

-0 

.13 

-0 

.23 

22.5 

0.087 

0.079 

0.000 

0.000 

PIP.G18 

1.20 

-0 

.18 

-0 

.18 

0.7 

0.057 

0.043 

0.018 

0.398 

PIP.G45 

1.03 

-0 

.44 

-0 

.56 

0.7 

0.416 

0.401 

0.000 

0.964 

PIP.G60 

0.95 

-0 

.35 

-0 

.27 

0.4 

0.162 

0.155 

0.000 

0.000 

SOS.A21 

1.96 

0 

.01 

-0 

.13 

2.7 

0.019 

0.014 

0.018 

0.000 

SOS.A22 

2.04 

-0 

.16 

•0 

.02 

2.7 

0.029 

0.005 

0.299 

0.941 

SOS.A23 

1.53 

0 

.05 

0 

.10 

1.4 

0.011 

0.009 

0.581 

0.434 

SOS.A6 

2.83 

-0 

.06 

-0 

.01 

3.1 

0.003 

0.015 

0.830 

0.593 

TECO.B8 

3.70 

0 

.12 

-0 

.16 

6.6 

0.040 

0.018 

0.172 

0.992 

TEC0.F1 

2.71 

0 

.08 

-0 

.01 

4.7 

0.007 

0.016 

0.744 

0.565 

TECO.F20 

5.76 

-0 

.11 

-0 

.09 

6.3 

0.028 

0.018 

0.064 

0.276 

TECO.G37 

28.03 

-0 

.01 

-0 

.01 

123.2 

0.000 

0.018 

0.990 

0.243 

TECO.G38 

17.25 

0 

.00 

-0 

.04 

65.1 

0.002 

0.008 

0.847 

0.110 

TECO.G55 

4.33 

-0 

.23 

-0 

.27 

4.2 

0.137 

0.122 

0.000 

0.819 

TECO.H1 

2.24 

-0 

.24 

-0 

.20 

2.7 

0.095 

0.084 

0.000 

0.743 

TEC0.J5 

6.41 

-0 

.25 

-0 

.03 

6.3 

0.060 

0.038 

0.068 

0.757 

TEC0.F1 

4.69 

-0 

.15 

>0 

.01 

12.8 

0.021 

0.006 

0.240 

0.000 

TEC0.P1 3 

4.32 

.0 

.15 

-0 

.16 

5.5 

0.041 

0.017 

0.185 

0.991 

USBR.B 

36.95 

-0 

.15 

-0 

.16 

249.9 

0.063 

0.060 

0.000 

0.096 

USER . D 

52.27 

-0 

.04 

-0 

.00 

330.3 

0.001 

0.007 

0.862 

0.000 

USER.F 

5.59 

-0 

.01 

-0 

.01 

17.6 

0.000 

0.003 

0.947 

0.000 

USER . H 

6.53 

-0 

.05 

0 

.00 

40.0 

0.003 

0.002 

0.548 

0.000 

USER . L 

4.28 

-0 

.01 

-0 

.11 

7.8 

0.012 

0.007 

0.110 

0.001 

USER . N 

30.21 

-0 

.08 

-0 

.02 

187.3 

0.006 

0.002 

0.224 

0.000 

USER . P 

7.10 

-0 

.16 

-0 

.09 

17.2 

0.036 

0.035 

0.000 

0.000 

USER . T 

25.07 

-0 

.12 

-0 

.11 

111.0 

0.026 

0.019 

0.032 

0.941 
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Table  2.9  : Comparison  of  Different  Models 


Model 

Net  Reduotion 

in  Total  SSR 

Program 

Prooesses 

User 

Processes 

ARMA( 1,1) 

6.3? 

3.9? 

AR(1) 

2.6? 

2.3? 

MA(1) 

4.0? 

1.7? 

AR(2) 

5.2? 

3.8? 

MA(2) 

6.6? 

3.0? 

i 


< 

I 

I 


The  total  reduotion  in  SSR  for  various  model  types  are  listed  In 
Table  2.9  . The  reductions  have  been  expressed  as  a fraction  of  total 
SSR  for  Oth  order  model  ( x(t)  « z+e(t)  ).  We  see  that  for  program 
prooesses  the  maximum  reduotion  achievable  is  only  6.6?  If  we  choose  the 
MA(2)  model.  The  gain  is  only  3.9?  in  ease  of  USER  processes.  The  next 
question  is  whether  with  this  little  reduotion  it  is  worth  while  having 
a two  parameter  model.  In  our  judgment*,  it  is  too  muoh  work  for  too 


* The  analysis  presented  in  this  section  is  more  of  a qualitative  nature 
than  quantitative.  Henoe,  personal  preferences  and  biases  of  the 
analysts  may  well  affeot  the  final  oonoluslon.  However,  it  is  the 
approaoh  rather  than  the  result  that  we  deem  more  Important.  It  is 
quite  possible  for  some  analyst  to  disagree  with  our  oonolusions. 
However,  they  can  still  follow  our  approaoh  and  come  up  with  a 
scheduling  algorithm  based  on  control  theoretio  arguments. 


g 


i 


i 
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little  gain  and  the  zeroth  order  model  is  good  enough.  Our  conclusion 
is  also  backed  up  by  many  of  the  observations  made  during  the 
identification  step,  viz.,  violent  variations  in  the  process,  small 
values  of  ACP  and  PACF,  non-zero  probabilities  in  chi-square  tests  etc. 
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IA  g£H£Emms  ALfiQfillW  BASED  OH  m ZEBQTH  ORDER  MODEL 


The  net  conclusion  of  the  analysis  so  far  is  that  the  CPU  demand 
behavior  of  programs  is  best  represented  by  the  following  Oth  order 
model : 


*(t)  « Z+e( t ) 

Since,  e(t)  is  uncorrelated  zero  mean  noise,  it  cannot  be  predicted,  and 
the  best  estimate  of  the  future  CPU  demand  is  its  mean  value,  i.e., 

5(t)  a Z 


where  z > 1 
N 


zOO 


The  problem  in  using  the  above  formula  is  that  z can  be  calculated  only 
after  all  values  of  z(t),  t*1,2,...,N  are  known.  What  we  need  now  is  an 
adaptive  technique  to  calculate  2 and  update  it  each  time  a new 
observation  is  obtained.  Some  of  the  possible  adaptive  methods  are 
discussed  below. 


-L.  Surctnt.  tocm  : 

Ls.1 


11  k»1 


Average  of  all  values  observed  up  to  t-1. 
z(k)  t>1 


s 


1 

t-1 


z( t-1  ) 


* ^1~at)*t-1  ♦ «t*(t-1)  where  *t*f^ 
Here,  denotes  the  ourrent  estimate  of  the  mean. 
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for  CPU  allocation. 


i 


Notice  that  the  SPRPT  algorithm  does  not  require  any  extra  book 
keeping  other  than  whet  is  already  done  by  the  operaing  system.  Most 
operating  systems  record  CPU  time  used  by  programs  for  accounting  and 
billing  purposes. 


/ 
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Z1  CQBCLU.SIQM 

A control  theoretic  formulation  of  the  CPU  management  problem  has 
been  presented.  The  problem  has  been  formulated  as  one  of  predicting 
the  future  CPU  demand  of  a job  based  on  its  previous  demands.  several 

analytical  expressions  for  the  effect  of  prediction  errors  on  the  mean 

finishing  time  of  tasks  have  been  derived.  The  results  of  an  experiment 
to  study  the  behavior  of  actual  program  have  been  reported.  The 
empirical  study  shows  that  the  CPU  demands  of  program  follow  a white 
noise  model.  The  best  least -squares  predictor  for  the  next  CPU  burst 

is,  therefore,  the  current  mean.  Three  different  schemes  for  adaptive 

prediction  have  been  proposed.  An  adaptive  scheduling  algorithm  called 
SPRPT  has  been  proposed. 


[ 

. 
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AJ.  EBQBLBH  SIAIEM6MI 

Memory  management  la  the  technique  whereby  an  operating  system 
creates  an  Illusion  of  virtually  unlimited  memory  even  though  the  actual 
physical  memory  is  limited.  Thus,  a user  program  having  memory 
requirement  larger  than  the  available  physical  main  memory  can  be  run  on 
the  system.  This  Is  accomplished  by  dividing  the  user  program  into 
several  equal  size  (say  IK  Words)  pieces  called  pages.  The  whole 
program  is  stored  on  a secondary  memory  (drum  or  disk)  and  only  a few 
pages  are  loaded  in  the  primary  (core)  memory.  The  program  is  then 
allowed  to  run.  Obviously,  the  program  will  be  interrupted  when  it 
tries  to  reference  a page  that  is  not  in  the  primary  memory.  This 
situation  is  called  "page  fault". 


On  a page  fault,  the  demanded  page  is  brought  in  to  the  core. 
Space  for  the  incoming  page  is  obtained  by  removing  either  a page  of 
this  same  program  or  a page  of  some  other  program  residing  in  the  core. 
In  the  first  case,  total  core  memory  Available  to  each  program  remains 
fixed,  and  in  the  second  case,  it  varies  with  time.  The  former  scheme 
is  known  as  fixed  partitioning  and  the  latter  as  variable  or  dynamic 
partitioning.  In  either  case,  when  a new  page  is  brought  in,  an  old 
page  must  be  removed  from  the  core.  The  page  to  be  removed  is 
determined  by  using  a page  replacement  algorithm.  Thus,  the  chief 
problem  in  memory  management  is  that  of  page  replacement . 


I 
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Intuitively,  the  beat  page  to  remove  ia  the  one  that  will  never  be 

i 

needed  again  or,  at  leaat , not  for  a long  time.  In  fact,  it  haa  been 
proved  that  for  fixed  memory  partitioning,  the  beat  page  to  remove  ia 
the  one  that  will  not  be  referenced  for  the  longest  interval  of  time. 

Thia  policy  called  'MIN'  ia  optimal  in  the  aenae  that  it  minimizea  the  i 

total  number  of  page  faulta  [Bel66].  However,  thia  requlrea  advance  ' 

knowledge  of  the  future  page  referencea  (a  prediction  problem!).  A 

realizable  approximation  to  MIN  ia  the  Leaat  Recently  Used  (LRU)  policy 

which  aaaumea  that  the  page  that  haa  not  been  referenced  for  the  longeat 

interval  in  the  paat  ia  the  one  that  will  not  be  referenced  for  the 

longeat  interval  in  future,  and  ia  the  candidate  for  replacement. 

In  caae  of  variable  memory  partitioning,  it  haa  been  ahown  that 
MIN  and  it a LRU  approximationa  are  not  optimal.  The  optimal  page 
replacement  policy  in  thia  caae  (called  VMIN  algorithm)  la  to  remove  all 
thoae  pagea  that  will  not  be  referenced  during  the  next  T time  interval 
(t,  tVT),  where  T » R/U  ia  the  ratio  of  the  coat  of  bringing  a new  page 
in  the  main  memory  from  aecondary  memory  to  the  coat  of  keeping  a page 
in  the  main  memory  for  unit  time  [PrF76].  Again,  thia  ia  only  of 
theoretical  lntereat , Because  it  requires  knowledge  of  the  future  page 
reference  string. 

A realizable  approximation  to  VMIN  policy  ia  the  Working  Set  (WS) 

I 

i 

Policy  [Den68].  According  to  this  policy,  the  pagea  moat  likely  to  be 

| 

referenced  in  the  next  T interval  (t,  t+T)  are  those  which  have  been 
referenced  during  last  T interval  (t-T,  t).  All  other  pagea  can 
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therefore  be  removed.  The  Interval  T is  called  the  window  size. 

Both  LRU  and  WS  try  to  predict  the  future  reference  pattern  from 
the  past  behavior  of  the  program.  Efficient  operation  of  these 
algorithms  is  dependent  upon  the  degree  of  locality  of  reference  in 
programs.  In  statistical  terms,  the  principle  of  locality  states  that 
there  is  a high  correlation  between  the  immediate  future  and  the  recent 
past  behavior  of  a program. 


Memory  Management 
Control-theoretic  Formulation 


Page  3-5 


1*2  SSHIfiQL  IHSQBSHC  EQBMULAIIQB 

It  is  obvious  from  the  previous  discussion  that  the  problem  of 
page  replacement  is  a prediction  problem.  If  we  can  somehow  model  the 
page  reference  string  as  a stochastic  process,  we  can  use  modern  control 
theoretic  prediction  algorithms  such  as  Wiener  filter,  or  Kalman  filter 
etc.  to  predict  future  page  reference  string. 

There  are  many  ways  to  model  the  reference  string  as  a stochastic 
process.  Ideally  the  model  should  be  such  that  it  incorporates  all  the 
information  contained  in  the  page  reference  string.  However,  such  a 
model  becomes  very  complex  and  difficult  to  analyze.  We,  therefore, 
choose  to  begin  with  a rather  simple  stochastic  process  model  suggested 
by  Arnold  [Arn75].  More  complexity  may  be  introduced  in  the  future 
work.  The  implications  of  this  simplification,  and  limitations  of  the 
conclusion  drawn  from  this  model  are  discussed  in  the  last  section  of 
this  chapter.  It  turns  out  that  even  this  simplified  model  gives  us 
much  useful  insight  in  to  the  problem.  The  stochastic  process  is, 
therefore,  described  next. 

The  page  reference  pattern  of  a given  (say  i**1)  page  of  a program 
can  be  modeled  as  a zero-one  process  as  follows  : 

( 1 if  the  page  is  referenced  in  the 
Icth  interval  ( (k-1)T  < t < kT  ) 

z(k)  a < 

\ 0 otherwise 


3-6 

Memory  Management 
Control-theoretic  Formulation 

A sample  trajectory  of  the  process  is  shown  in  Figure  3.1.  The 
problem  of  page  replacement  is  that  of  predicting  z(k)  given  trajectory 
up  to  time  (k-1)T,  i.e.,  finding  the  best  estimate  2(k)  of  z(k)  from 
measurements  up  to  time  (k-1)T.  This  problem  is  well  known  in  control 
theory.  There,  much  work  has  been  done  on  the  prediction  of  stochastic 


processes. 


IT  2 T 3T  4T  5T  6T  7T  t 


Figure  3-1  References  to  o page  modelled  as  a binary  stochastic  process 
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3.3  COST  EXPRESSION 

In  this  section  we  derive  an  expression  for  the  cost  of  imperfect 
prediction.  In  memory  management  with  fixed  memory  partitioning, 
generally  the  objective  is  to  minimize  the  chances  of  page  faults.  In 
the  case  of  variable  partitioning,  however,  page  faults  alone  do  not 
provide  an  adequate  criterion.  This  is  because  it  is  always  possible  to 
reduce  page  faults  for  one  particular  program  by  giving  it  more  memory. 
This,  however,  penalizes  other  programs  which  must  operate  with  less 
memory.  Thus,  an  additional  objective  is  to  keep  memory  usage  also  to  a 
minimum.  This  second  cost  is  often  referred  to  as  space  time  product. 
The  total  cost  is,  therefore,  calculated  as  follows. 

Let  R a Cost  of  a page  fault 

s Cost  of  bringing  a new  page  in  to  memory 
and  U a Cost  of  memory  usage 

a Cost  of  withholding  one  page  of  memory  from  other 
users  for  unit  time 

Let  2(k)  denote  the  predicted  value  of  z(k)  from  information  available 
at  time  (k-1)T.  Due  to  imperfect  knowledge  of  the  future,  z(k)  and  2(k) 
are  not  the  same.  A price  has  to  be  paid  for  errors  in  prediction.  If 
both  z and  2 can  take  only  0,  1 values,  then  there  are  only  4 cases  to 
be  considered  as  shown  in  Table  3.1. 

Thus  the  additional  cost  due  to  imperfect  prediction  of  z is  given 

by  : 


C a Rzl  ♦ UTz2 
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TABLE  3.1 

: Costs  of 

memory  management 

Decision 

Additional 

z 

2 

Based  on  2 

Cost 

Remark 

0 

0 

Remove 

0 

0 

1 

Keep 

UT 

The  page  is  not  referenced 
but  still  kept. 

1 

0 

Remove 

R 

A page  fault  occurs. 

1 

1 

Keep 

0 

The  page  is  referenced  and 
it  is  in  the  memory. 

Our  aim  should  be  choose  2 such  that  the  expected  cost  E[C]  is  minimum. 
E[C]  * E[RzZ  ♦ UTZ2] 

If  we  choose  our  decision  interval  T such  that  T»R/U  or  R*UT,  we  have 
EtC]  * B[  R(*2  ♦ *8)] 

* E[  R(z-2)2] 

* R E[(z-2)2] 

a R times  the  mean  square  prediction  error 
Note  that  the  second  equality  above  holds  only  if  both  z and  2 are 
zero-one  valued  variables,  not  otherwise. 


A classic  solution  to  the  least  square  prediction  problem  is  due 
to  Wiener[Pap65,  p.  408].  It  consists  of  designing  a linear  system 
(Wiener  filter)  with  impulse  response  h(u)  such  that  the  output  of  the 
system  is  the  estimate  2(t)  when  input  Is  z(k),  0 < k < t-1  (see 
Figure  3.2). 


vQD 

* /Lb(u)z(t-u) 

Ual 

The  impulse  response  h(u)  can  be  obtained  by  solving  the  Wlener-Hopf 
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r 

L 


Predicted  « 

™ « — ”zU) 


Figure  3.2’  Wiener  filter  predictor.  h(u)  is  given  by 
solution  to  Wiener-Hopf  equation. 


k 


z(t-k) 


POSt 

obser- 

vations 


Linear  System 
with  impulse  response 
h(  u) 
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equation  : 

(k-u)h(u)  k=0,  1,  2,  ... 

Us  1 

where  C(k)  s autocorrelation  function  of  2(t).  The  memory  management 
problem  can,  therefore,  be  solved  by  measuring  C(k)  and  solving  the 
Wiener-Hopf  equation. 

Strictly  speaking  the  Wiener  filter  technique  is  not  applicable  to 
binary  processes.  For  example,  the  output  of  the  predictor  will  not  be 
necessarily  0 or  1 . It  may  take  any  value.  The  analysis  is,  therefore, 
approximate.  The  reason  for  the  choice  of  this  method  for  initial 
analysis  is  that  there  is  no  as  convenient  a way  of  modeling  binary 
processes*  as  for  continuous  processes.  In  fact,  the  techniques  for 
modeling,  estimation,  and  prediction  of  continuous  processes  are  so  well 
developed  that  it  is  no  longer  necessary  to  solve  the  Wiener-Hopf 
equation  in  order  to  find  the  optimal  predictor.  Simply,  by  looking  at 
the  shape  of  the  autocorrelation  function,  it  is  possible  to  guess  the 
model  of  the  system  that  could  have  generated  the  process  [BoJ70  & 
Nel733.  For  example,  an  exponentially  decaying  autocorrelation  function 
implies  an  AR(1)  model,  i.e.,  the  Impulse  response  h(u)  (the  solution  of 
the  Wiener-Hopf  equation)  is  zero  every  where  except  at  u*1.  These 
"Time  Series.  Analysis  Techniques"  provide  very  convenient  means  for 
modeling  empirical  data. 

* We  have  developed  some  techniques  for  modeling  binary  processes. 
These  techniques  and  their  applications  to  page  reference  process  are 
described  in  the  next  ohapter.  In  this  chapter  we  report  the  results 
using  conventional  techniques. 
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Arnold  [Arn75]  has  reported  the  results  of  autocorrelation 
measurements  on  a number  of  programs.  His  conclusion  is  that  in  most 
cases  the  autocorrelation  function  has  the  following  form  : 

C(k)  a p ♦ (l-p)qh  with  p > 0 and  qaconstant 
Arnold  reports  in  his  paper  that  one  of  the  main  findings  of  his 
measurements  is  the  fact  that  the  autocorrelation  function  does  not  go 
to  zero,  i.e.,  the  constant  pAO. 

An  important  implication  of  this  observation  is  that  the  page 
reference  process  is  a non-st at  ionary  stochastic  process.  In  fact,  a 
commonly  used  test  for  stationarity  is  to  verify  that  the 
autocorrelation  function  C(k)  dies  down  to  zero  at  large  lags  [BoJ70]. 
A simple  explanation  is  that  if  the  correlation  between  z(k)  and  z(0)  is 
zero  for  large  k,  the  effect  of  the  initial  conditions  will  not  be  felt 
after  large  enough  k,  and  the  process  will  eventually  reach  a state  of 
"statistical  equilibrium"  called  stationarity. 

There  are  unlimited  number  of  ways  in  which  a process  can  be 
non-st at ionary.  However,  most  of  the  real  world  non-stat ionary 
processes  exhibit  a "homogeneous",  non-st at ionary  behavior  such  that  some 
suitable  difference  of  the  process  is  stationary.  Por  example,  if  the 
process  exhibits  homogeneity  in  the  sense  that  apart  from  local  level 
( i.e.,  local  mean  ),  one  part  of  the  series  behaves  much  like  any  other 
part , then  the  first  difference  of  the  prooess  any  be  found  to  be 
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stationary 


To  model  such  non-atat ionary  processes,  therefore,  one  studies  the 
autocorrelation  function  of  1st,  2nd,  3rd,  ...  differences  until  a 
stationary  process  is  obtained.  Thus,  to  model  *(t)  we  should  study  the 
autocorrelation  functions  of 


Ddz(t)  • D<l-1z(t)  - D<*-1z(t-1) 
till  a stationary  process  D<*z(t)  is  found 


The  non-st at lonarity  of  page  reference  process  z(t)  can  be 
explained  as  follows.  Even  though  the  program  behavior  may  be 
stationary  in  one  looality,  the  frequency  of  reference  to  a particular 
page  varies  as  the  program  progresses  from  one  looality  to  the  next. 
Thus,  the  process  »(t)  may  behave  like 


a set  of  locally  stationary 
processes,  i.e.,  like  a homogeneous  non-st a lonary  process  whose  mean 
value  varies.  If  this  is  so,  the  first  difference  of  z(t)  must  be 
stationary.  At  this  point  this  is  Just  a hypothesis.  The 
identification  results  presented  in  the  next  section  confirm  this 
hypothesis. 
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Thla  aection  deaeribea  the  results  of  modeling  the  lat  and  higher 
dlfferenoea  of  the  page  reference  process.  The  data  for  analysis  was 
supplied  by  Arnold.  It  consisted  of  a reference  string  trace  of  the 
MUDDLE  compiler.  About  5 different  pages  were  chosen  for  analysis. 

The  autocorrelation  functions  of  some  of  the  pages  studied  is 
shown  in  Figure  3.3.  The  broken  lines  indicate  the  95)1  confidence 
interval  of  the  ACF  for  the  given  sample.  It  is  obvious  from  this 

figure  that  the  process  is  non -stationary  and  further  differencing  is 
necessary.  The  first  difference  process  y(t)  is  defined  as 
y(t)  * z(t)-z(t-1) 

In  almost  all  cases  studied  the  first  differences  turned  out  to  be 
stationary.  Sample  autocorrelation  (ACF)  and  partial  autocorrelation 
(PACF)  functions  are  shown  in  Figure  3.4.  The  oommon  characteristics  of 
these  functions  and  the  inferences  that  we  can  draw  from  these  are  now 
described. 

A*  The  ACF  cuts  off  gJL  large  lams.  This  implies  that  the  1st 
differences  are  stationary  and  no  further  differencing  is  necessary. 
Thus  the  appropriate  model  for  the  page  reference  prooess  z(t)  would  be 
an  ARIMA(p,l,q)  model  (Auto-Regressive  Integrated  Moving  Average  model 
of  order  p,1,q)  for  some  suitable  p,  and  q. 
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lx  I Is.  asm  fi£  Uu.  difference  prcceaa  1&  almcat  aero.  Thla  means  that 
the  conatant  term  In  the  ARMA  model  for  y(t)  could  be  taken  aa  zero. 
Thla  property  of  y(t)  la  leaat  surprising  because  a little  arithmetic 
shows  that  this  must  be  so. 

y(t)  • z(t)-*(t-1) 

Hence , mean  of  y(t)  ■ -I- r"  . 

"-\V(  ’ 

* jjiit 

t«2 
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L 

I V. 

t 
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M-1 

*0  or  +-!• 

-M-1 

I 0 

£a  Both  ACF  uuL  PACF  yt  largest  lag  after  which  they  dig  cut 
slowly.  Of  course,  there  are  a few  Jumps  at  other  lags*  and  the  fall  Is 


not  smooth.  The  main  point  is  that  CC 1 ) and  0(1)  are  not  inslgnirioant 
as  was  the  case  for  CPU  demand  processes.  Therefore,  suitable  low  order 
model  for  y(t)  Is  ARMA(1,1)  model.  To  see  this  clearly,  consider  the 


* Some  pages  show  periodic  peaks  In  the  ACF  even  at  large  lags.  This  * 

happens  for  the  pages  that  are  In  a big  (with  respect  to  Interval  T) 
program  loop.  If  the  total  time  of  the  loop  Is  kT  (say),  the  page  Is 
referenced  e*er  y k Intervals.  Thus  z(t)  and  *(Uk)  are  highly 
correlated,  and  so  are  z(t)  and  s(t+Jk),  >2,3,...  This  will  cause 
peaks  in  ACF  at  lags  jk,  J«1,2,3,...  In  conventional  time  series 
analysis  this  behavior  is  called  "seasonal".  Though  It  is  not  very 
dlffioult  to  model  this  behavior,  we  will  not  consider  this  here  In 
order  to  keep  the  analysis  simple. 
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ARMA  model  : 

y(t)  - ay(t-l)  ■ e(t ) - be(t-l) 

The  ACF  and  PACF  of  this  process  for  different  relative  values  of 
parameters  a and  b are  shown  In  Figure  3.5.  The  exact  expressions  are 


as  follows  : 

C(0) 

> 1 

0(0) 

s 1 

C(1) 

1-2ab+b 2 

0(1) 

a a-b 

C(k+1 ) 

« aC(k)  k>0 

0(k+1 ) 

3 b0(k)  k>0 

The  comparison  of  Figure  3.4  with  Figure  3.5  shows  that  a 


i 


i 


ARMA( 1,1)  model  with  b>a>0  could  be  used  for  y(t). 


JBa  BlS.  hSS.  £3.  well  PACF  are  negative  at  lax  1 . This  observation  along 
with  the  expressions  for  C(1)  and  0(1)  given  above  further  confirm  the 
constraint  guessed  above,  i.e.,  b>a.  The  fact  that  the  successive 
values  of  y(t)  will  always  be  negatively  correlated  can  easily  be  seen 
as  follows: 

y(t)*1  *>  z(t )*1  *>  y(t-f1  )*0  or  -1 
y(t)«-1  «>  z(t )»0  «>  y(t+l)»0  or  1 

Thus  a positive  value  of  y(t)  implies  that  that  the  next  value  will  be 
zero  or  negative  and  vice  versa. 

The  parameter  values  obtained  for  the  cases  analyzed  are  listed  in 
Table  3.2*  Notice  that  a and  b do  satisfy  the  constraints  (b>a>0) 
conjectured  above.  In  addition  we  notice  that  b is  almost  always  nearer 
to  1 and  a is  nearer  to  0.  Also  listed  in  the  table  is  the  relative 


I 


/ 
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TABLE  1.2  : 

Parameter  values 

for  the  ARIMAC 1,1,1)  model 

’ 

O-aB)O-B)z(t) 

« (1-bB)e(t) 

Page  # 

a 

b 

R2 

44 

0.654 

0.961 

14.11 

79 

. 

0.446 

0.656 

17.3% 

1 

172 

0.345 

0.822 

20.6% 

206 

0.0 

0.858 

41 .4% 

226 

0.0 

0.783 

38.21 

reduction  in  the  variance  CR2)  achieved  by  the  model.  Notice  that  the 
model  does  provide  significant  gain  in  prediction  efficiency  over  a 
zeroth  order  model. 

The  fhot  that  ARMA(1,1)  is  the  appropriate  low  order  model  for 
y(t)  is  further  confirmed  by  comparing  it  with  other  low  order  models 
like  AR( 1 ) , MA( 1 ) , or  AR(2).  The  relative  efficiency  of  these  models  is 
listed  in  Table  3.3.  Notice  that  in  all  cases  analyzed,  the  ARMA(1,1) 
model  turns  out  to  be  the  most  efficient. 

Since  y(t)  is  the  first  difference  process  of  z(t),  z(t)  is  said 
to  be  the  first  "integrated"  process  of  y(t).  Thus  an  Auto  Regressive 
Moving  Average  (ARMA)  model  of  order  1,1  for  y(t)  implies  an  Auto 
Regressive  Integrated  Moving  Average  (ARIMA)  model  of  order  1,1,1  for 
z(t).  The  model  equation  for  y(t)  is 


I 
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(1-aB)y(t)  * (1-bB)e(t) 

where  B la  the  backward  shift  operator  (By(t)  s y(t-1)).  Further  since, 

y(t)  « z(t)-z(t-1)  . O-B)z(t) 

« ...  > , • 

the  model  equation  for  z(t)  is  given  by  the  following  equation  : 

(l-aB)O-B)z(t)  * (1-bB)e(t) 

TABLE  3.3  : Comparison  of  ARMA  models  of  1st  difference  process 

y(t)  ■ z(t)  - z(t-1) 


List  of  R2  (* 

reduction 

in. varianoe) 

for  different 

models  of  y(l 

J 

Page  # 

AR(  1 ) 

AR<2) 

MA(1) 

ARMA( 1,1) 

44 

4.6* 

6.0* 

6.4* 

14.1* 

79 

7.6* 

13.2* 

17.3* 

172 

13.2* 

16.9* 

19.1* 

20.6* 

206 

25.0* 

29.0* 

41.4* 

41.4* 

226 

25.4* 

29.5* 

38.2* 

38.2* 
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The  net  conclusion  of  the  last  section  is  that  the  page  reference 
behavior  of  programs  can  be  appropriately  modeled  by  an  ARIMA( 1,1,1) 
model  : 

(1-aB)(1-B)z(t)»(1-bB)e(t) 


where  z(t)  is  the  binary  page  reference  process,  e(t)  is  white  noise,  B 
is  the  backward  shift  operator,  and  a,  b are  model  parameters  with 
b>a>0.  Using  this  model,  we  can  derive  equations  for  prediction  of  z(t) 
based  on  process  observations  up  to  time  t-1.  In  the  following,  we 
derive  two  such  sets  of  equations  called  "open  loop"  and  "closed  loop" 
predictors.  An  implementation  of  open  loop  predictor  using  two 
exponential  weighted  averages  is  also  discussed. 


.L&lI  gfiffil  Loop  Predictor  : The  usual  way  to  derive  a predictor  for  any 

ARIMA  model  is  to  transform  it  into  an  equivalent  AR  model.  For  our 

ARIMA( 1,1,1)  model,  this  is  done  as  follows: 

. (1-sB)(1-8) 

1-bB 

* [ 1 - (l-fa-b)B  -(b-a)(1-b)  £ p1"2®1  1 z(t) 

i«2 

■ *(t  )-(1+a-b)z(t-1)-(b-a)(  1-b)  ]^bi-2t(t-i) 


or,  z(t)  a e(tMWa-b)*(t-1Mb-a)(1-b)  jPbi-2*(t-i) 

ia2 
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Since  e(t)  is  white  noise  and  ca.,  not  be  estimated  or  predicted 
from  the  previous  observations,  the  best  estimate  of  z(t)  from 
observations  up  to  time  t-1  is  given  by: 

2(t)  . (1+a-b)z(t-lMb-a)(1-b)  ^bi-2z(t-i)  [3.1] 

i«2 

This  equation  is  very  inconvenient  to  use  because  it  requires  knowledge 
of  all  previous  observations.  One  way  to  simplify  it  is  to  ignore  the 
higher  order  terms  (terms  with  small  coefficients).  In  practice,  terms 
after  the  5th  lag  can  be  easily  ignored  without  much  loss  in  accuracy. 

A more  ingenious  procedure  is  to  rewrite  the  elation  3.1  as 
follows: 

2(t)  * (l-c)z(t-l)  + c Z(t-2)  [3.2] 

where  csb-a>0  and  z(t-2)  is  defined  as  (1-b)  times  the  summation  term  in 
the  equation  3d.  It  can  be  recursively  calculated  as  follows: 

8(t-2)  « (1-b)z(t-2)  ♦ bZ(t-3)  with  Z.0)»0  [3.3] 

Notice  that  both  the  equations  3.2  and  3.3  represent  exponential 
weighted  averages.  However,  the  weighting  coefficients  in  the  two 
equations  are  quite  different,  because  b-c  » a i 0. 


I 
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• ,<t>  . Ui!:S>_r.*!  „ 

1-B)  1-bB  *(*-') 

* *(t>  * «<«-') 


Hence  2(t)  = 12i2:$2_r5§ 
1-bB 


2(t-1) 


or  (1-bB)2(t)  > [( 1+a-b)  - aB]z(t-1) 

or  2(t)-b2(t-1)  = ( 1+a-b) z(t-1 ) - az(t-2) 

or  2(t)  * (Ua)z(t-I)  - az(t-2)  - b[z<t-1 )-2(t-l )] 

* (1+a)z(t-1)  - az(t-2)  - be(t-l)  [3.4] 

where  e(t)  r z(t)-2(t) 

~v 

* Error  in  prediction  at  t 
> Innovation  sequence 

The  block  diagram  representations  of  prediotors  given  by  equation  3.1 
and  3.4  are  shown  in  Figure  3.6.  It  is  obvious  from  the  diagrams  why  we 
call  these  predictors  open  loop  and  closed  loop  respectively. 


Memory  Management  Page  3-25 

Page  Reference  Prediction 


Figure  3.6=  Block  Diogrom  representation  of  Prediclors 
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U ARIMA  PAGE  REPLACEMENT  ALGORITHM 

Using  either  the  open  loop  exponential  predictor  or  the  closed 
loop  equation  derived  above,  one  can  design  a page  replacement 
algorithm.  We  will  call  such  an  algorithm  "ARIMA  page  replacement 
algorithm".  In  the  following,  we  describe  the  algorithm  based  on  the 
exponential  predictor.  The  closed  loop  version  can,  similarly,  be 
designed.  The  algorithm  is  as  follows. 

1.  Associated  with  each  page  i is  a hardware  register  Z ^ (called 
Z-register).  Also  associated  is  a bit  z l (called  z-bit). 

2.  Whenever  a page  is  referenced  its  associated  z-bit  is  set. 

3.  Every  T Interval  (where  T is  the  ratio  of  costs  as  described 
before),  all  2-registers  are  updated  using  the  following  (FORTRAN 
like)  statement  : 

ti  > (1-b)»*1  ♦ b*^ 
and  all  z-bit s are  cleared. 

k.  When  a new  page  is  loaded  in  the  memory  the  zt  bit  is  cleared  and 
is  initialized  to  0. 

5.  At  the  time  of  page  replacement,  whioh  could  be  every  T interval, 
or  more  appropriately  at  page  fault,  a quantity  oalled  *i  ls 
calculated  as  follows  : 

*1  ■ ( 1-c)*Zj  + o§I^ 

Based  on  llf  a decision  ls  made  regarding  the  page  to  be  replaced. 
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There  are  many  possible  ways  of  making  this  decision.  Some 
examples  of  suoh  decision  rules  are  described  below  : 

a.  The  page  with  least  1 u replaced. 

b.  All  pages  with  4^  < ic,  where  k is  some  suitable  cut  off  point 
(say  0.5)  sre  replaced. 

c.  The  first  page  encountered  with  ^ is  replaced. 

d.  Among  the  pages  with  < kt  the  page  to  be  replaoed  is 

selected  using  LIFO,  FIFO,  or  LRU  algorithms. 

Of  course,  one  may  also  use  a combination  of  two  or  more  rules, 

for  example,  in  the  case  b above,  if  there  is  no  page  with  then 

use  the  rule  a,  or  try  varying  k depending  upon  the.  page  fault  frequency 
and  so  on. 

The  main  overhead  involved  in  this  algorithm  is  the  update  of  z 
registers  every  T Interval.  This  overhead  is  not  excessive  considering 
that  it  Involves  only  one  multiplication,  one  addition,  and  a 
complementation.  A simple  hardware  circuitry  could  be  used  to  do  this 
task  as  shown  in  Figure  3.7.  At  the  time  of  replacement  the  same 

olrouitry  could  be  used  for  prediction  by  replacing  b by  e. 

The  question  that  we  have  ignored  so  far  is  what  value  of 
parameters  b and  c should  be  used  in  the  ARIMA  algorithm.  Ideally,  one 
would  like  to  estimate  these  parameters  separately  for  each  page.  The 
estimation  technique  should  be  an  adaptive  one  so  that  and  c^  are 
updated  along  with  z^  every  T- interval . Alternately  one  could  use  some 
suitable  fixed  value.  This  latter  procedure  has  muoh  less  overhead  and 

r 
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3.7 ’ Hardware  Implementation  of  ARIMA  Algorithm 
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la  more  practical.  However,  In  thla  case,  program  behavior  monitoring 
should  be  done  from  time  to  time  to  deteot  drastic  variations  in  user 
program  behavior.  The  main  consideration  in  choosing  these  values  Is 
that  they  should  be  representative  of  user  program  behavior,  and  also 
they  must  be  easy  to  represent.  For  example,  for  the  pages  we  analyzed, 
we  found  that  the  average  ^values  of  b and  o were  0.856,  and  0.567 
respectively.  Therefore,  b«7/8,  and  c*1/2  seem  appropriate,  considering 
that  we  are  going  to  use  binary  arithmetic. 
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3,8  SPECIAL  CASES  QP  THE  ARIMA  ALGORITHM 

In  this  section  we  show  that  working  set,  reference  frequency,  and 
few  other  algorithms  are  special  cases  of  the  ARIMA  algorithm.  Another 
special  case  is  an  extended  working  set,  wherein  the  window  size  is 
dynamically  adjusted  to  match  the  program  locality  size.  Recall  from 
the  last  section  that  the  exponential  predictor  for  the  ARIMA( 1,1,1) 
model  is  given  by  : 

2(t)  • (l-c)z(t-l)  ♦ cz(t-2) 
z(t-2)  « ( 1-b)z(t-2)  ♦ bz(t-3) 

We  shall  refer  to  these  two  equations  as  prediction  equation,  and  update 
equation  respectively.  The  four  special  cases  of  the  ARIMA  algorithm 
occur  when  the  parameters  b and  o take  their  extreme  values  0 and  1. 
These  oases  are  now  described. 
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"Integrated  white  noise”  or  Wiener  process.  Thus  working  set  is  optimal 
for  programs  whose  page  reference  processes  constitute  a Wiener  process. 

Sinoe  using  WS  is  equivalent  to  using  a white  noise  model  for  the 
first  difference  process  y(t),  the  percentage  improvements  listed  in 
Table  3 >3  are  also  percentage  paging  cost  improvements  achievable  by 
various  ARIMA  models. 


Case  II  lisil  • For  this  case  the  update  equation  takes  the  following 
form  : 

Z(t-2)  * Z(t-3)  « Z (say) 

i.e.,  the  mean  is  time  invariant,  the  prediction  equation  therefore 
reduces  to  an  AR(1)  predictor  : 

2(t)  * (l-c)z(t-l )+eZ 

This  is  ACPQld .8  MifiDAC  ZUlfiL  JU&d  [Arn75].  Notice  that  this  is 
applicable  only  if  the  mean  is  time  invariant,  i.e.,  if  the  z(t)  process 
is  stationary. 


Cas£  IU  LS«li  kill  * For  this  special  oase,  like  oase  II  above,  the 
update  equation  implies  a time  invariant  mean  and  the  predictor  equation 
becomes 

*(t)  « 1 

i.e.,  the  pages  are  expeoted  to  be  referenced  with  fixed  frequency  (mean 
value)  and  the  page  with  least  s is  the  candidate  for  replacement.  This 
is  the  Reference  Frequency  policy  of  page  replacement . This  model  is 
also  known  as  Independent  Reference  Model  (IRM).  For  this  oase  the 
model  equation  beeomes 


j 


1 


I 


1 
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(l-B)z(t)  a (l-B)e(t)  since  aab-caO 
or,  z(t)  ■ e(t )+z 

This  Is  an  ARIMA(0,0,0)  or  white  noise  model.  Thus  the  reference 
frequency  model  Is  appropriate  only  if  the  page  references  constitute 
white  noise*. 

Case  IV  [cabl  : In  this  case,  the  predictor  and  the  update  equations 
become  similar. 

2(t)  a (l-b)z(t-l)  ♦ b*(t-2) 

*(t-2)  a (1-b)z(t-2)  ♦ bz(t-3) 

These  equations  can  be  rewritten  as  follows: 
t(t)  a z(t-1) 

*(t-1 ) a (l-b)z(t-l)  ♦ b2(t-1) 

This  is  an  extension  of  the  independent  reference  model.  Here  the 
reference  frequency  is  assumed  to  be  time  varying  and  is  computed 
adaptively  using  an  exponential  weighted  average.  This  policy  oould, 
therefore,  be  called  "Adaptive  Independent  Reference  Model"  (AIRM). 
This  is  optimal  when  aac-baO  and  the  process  model  is  ARIMA( 0,1,1): 
(l-B)z(t)  a (1-bB)e(t) 

i.e.,  z(t)  is  the  Integration  of  a first  order  (oolored  or  correlated) 
noise.  It  oould,  therefore,  be  oalled  "Colored  Wiener  Process". 


* This  same  conclusion  was  reached  by  Aho,  Denning,  nd  U liman  [ADU71 ]. 
They  oall  it  Ao  policy  and  show  that  the  polloy  is  optimal  when  the 
probability  of  referenoe  of  a page  depends  neither  on  time 
(stationarlty)  nor  on  previous  program  behavior  (no-autoeorrelation). 
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We  conclude  this  section  on  special  cases  of  the  ARXMA( 1,1,1) 
model  by  depicting  all  the  four  cases  discussed  above  on  a single 

I 

diagram  as  shown  in  Pigure  3.8.  The  ARIMA  model  operates  In  the 
triangular  region  0<e<b<1.  It  is  obvious  from  this  diagram  that  the 
ARIMA( 1,1,1)  is  a general  model  and  that  Working  set,  Arnold's  Wiener 
Filter,  Independent  Reference  Model,  and  Adaptive  Independent  Reference 
Model  are  all  its  boundary  oases. 


r 
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' 


Figure  3.8:  Special  cases  of  ARIMA  (1,1,1)  Algorithm 
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i»i  umraxiQMs  as  m eqbmulauqh 

In  the  lat . section  it  was  shown  that  the  working  set  policy  is  a 
special  case  of  the  ARIMA  policy.  Therefore,  a question  that  naturally 
arises  is  whether  there  is  any  relation  between  the  ARIMA  and  another 
popular  page  replacement  algorithm  LRU,  and  whether  the  LRU  is  also  a 
special  case  of  the  ARIMA.  The  answer  is  probably  "no”.  This  is 
because  of  the  limitations  of  the  zero-one  stochastic  formulation  that 
we  started  with.  It  uses  only  a subset  of  the  available  past 
information. 

A deeper  insight  into  the  information  structure  can  be  obtained  by 

considering  the  information  used  by  various  algorithms.  VMIN  - the 

u . , ■ / . 

optimal  variable  space  algorithm  uses  the  complete  page  reference 
string.  WS  requires  knowledge  of  set  of  pages  referenced  in  the  last  T 
interval.  It  does  not  require  the  order  in  which  the  pages  are 
referenced  or  the  number  of  times  they  are  referenced.  A general  ARIMA 
model  would  use  all  the  sets  of  pages  referenced  in  successive  T 
Interval.  Finally,  LRU  uses  the  set  of  last  referenced  m pages  along 
with  their  order  of  reference.  The  Venn  diagram  of  information  used  by 
these  algorithms  is  shown  in  Figure  3.9.  The  broken  line  in  this  figure 
separates  the  past  from  the  future.  There  are  two  Inferences  to  be 
drawn  from  this  figure  : 


Memory  Management 

Limit at  Iona  of  the  Formulation 


Page  3-37 


1.  The  information  uaed  by  the  WS  ia  a subset  of  that  uaed  by  the 
ARIMA.  This  explains  why  ARIMA  policiea  can  always  be  specialized  to  WS 
set  policiea  with  proper  win  w sizes. 

2.  There  is  much  information,  ( like  the  frequency  of  reference  of  a 
page  in  an  interval,  the  order  of  reference  of  various  pages,  and 
cross-correlation  between  different  page  processes  ),  that  is  not  used 
in  the  zero-one  stochastic  process  formulation  used  to  derive  the  ARIMA 
policy.  If  we  could  somehow  develop  a formulation  which  uses  the 
complete  past  information,  then  both  the  WS  and  the  LRU  will  be  special 
cases  of  the  generalized  model.  The  conclusions  drawn  would  then  be 
universal  In  soope. 
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aaa  CQHCLUSIQM 

The  page  reference  behavior  modeled  as  a zero-one  binary 
stoohastlo  process  exhibits  a non-st at  ionary  behavior.  An  ARIMA( 1,1,1) 
model  was  shown  to  be  appropriate  for  the  proeess.  This  model  is  then 
used  to  design  a memory  management  policy. 

The  main  results  achieved  in  this  chapter  can  be  stated  as 
follows : 

1.  We  have  shown  that  the  cost  of  Imperfect  prediction  is 
proportional  to  the  square  of  the  difference  between  the 
predicted  and  the  actual  value. 

2.  Using  empirical  results,  we  have  shown  that  the  ARIMA( 1,1,1) 
model  is  an  appropriate  model  for  page  reference  processes. 

3.  We  have  designed  a new  page  replacement  algorithm  called  ARIMA 
page  replacement  algorithm.  The  algorithm  is  shown  essy  to 
implement . 

A.  We  have  shown  that  many  conventional  algorithms  like  Working 
Set,  Reference  frequency,  and  Arnold's  Wiener  Filter  algorithm 
are  merely  boundary  oases  of  the  ARIMA  algorithm.  Also  we  have 
desoribed  conditions  under  whioh  these  boundary  oases  are 
optimal.  In  particular  we,  thus,  have  a oontrol  theoretic 
derivation  of  the  WS  policy. 
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In  the  analysis  presented  in  this  chapter,  approximations  were 
introduced  due  to  Gaussian  assumption.  We,  therefore,  expect  that  the 
development  of  identification  methods  for  discrete  binary  processes  will 
lead  to  better  understanding  and  management  of  program  memory  behavior. 
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4J.  miBQBUCUQH 

In  this  chapter  we  present  a new  approach  for  analysis  of  binary 
processes.  A process  z(t)  Is  called  binary  if  the  variable  z(.)  can 
take  only  two  values  0 and  1*.  A classic  example  of  a binary  stochastic 
process  is  the  so  called  "Semi-random  Telegraph  Signal",  which  consists 
of  a sequence  of  independent  identically  distributed  binary  random 
variables. 

In  computer  science,  binary  process  are  a common  occurrence.  For 
example,  as  was  shown  in  the  last  chapter,  the  reference  pattern  of  a 
particular  page  constitutes  a binary  stochastic  process  which  can  be 
used  to  design  new  memory  management  policies.  Similarly,  in  database 
management,  record  reference  patterns  constitute  binary  processes  which 
can  be  used  to  detect  changes  in  reference  patterns  and  to  determine 
optimum  points  for  database  reorganization.  In  computer  networks, 
packet  arrivals  at  a node  can  be  modeled  by  a zero-one  process.  Several 
similar  examples  can  be  constructed  in  the  areas  of  weather  prediction, 
signal  detection,  medical  diagnosis,  and  information  theory. 


• In  fact,  z(.)  can  take  any  two  values  say  a and  b.  The  analysis 
presented  here  can  still  be  used  by  transforming  it  to  another  process 

y(l),*blvS-  Notice  that  the  prooess  y(t)  is  a zero-one  prooeas. 
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In  spite  of  the  fact  that  binary  processes  are  so  common,  it  is 
surprising  that  no  direct  technique  for  identification  and  prediction  of 
such  processes  has  been  described  in  the  published  literature.  The  two 
known  methods  for  analyzing  such  processes  are  both  indirect  [C0L66]. 
In  the  first  method,  one  analyzes  the  intervals  between  successive 
z(t )»1  pulses.  These  interval  can  be  assumed  to  be  Gaussian,  and  the 
analysis  carried  out  as  usual.  Alternatively,  one  can  count  the  number 
of  z(t)*1  pulses  over  suitable  intervals  of  equal  length  and  model  the 
resulting  "count"  process  as  a Gaussian  process.  It  is  obvious  that 
both  these  approaches  for  "modeling"  of  the  process  are  not  suitable  for 
the  prediction  of  z(t)  given  its  history  upto  time  t. 

In  this  chapter,  we  present  a direct  approach  to  modeling, 
estimation,  and  prediction  of  binary  processes.  The  approach  is 
analogous  to  that  for  Gaussian  processes.  Like  the  Wiener  filter  for  a 
Gaussian  process  (see  Figure  4.1),  we  design  a system  (a  Boolean  system) 
whose  output  is  the  predicted  value  2(t)  , and  the  input  is  the  past 
history  of  the  process.  Our  model  is  more  general  than  the  Wiener 
filter  in  the  following  respects: 

1.  The  measure  of  goodness  of  the  predictor  is  not  limited  to  a fixed 
criterion,  e.g.,  least-squares  in  the  case  of  Wiener  filter.  Our 
method  applies  to  any  given  criterion:  linear  or  non-linear. 

2.  We  do  not  Impose  the  linearity  condition  on  the  system.  Our  method 
gives  the  optimal  non-linear  predlotor  for  the  process.  Further,  if 
the  optimal  predlotor  is  not  unique,  our  method  gives  all  the 
predictors. 
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3.  Our  model  la  not  restricted  to  stationary  processes  alone;  it  Is 
applicable  to  some  non-statlonary  processes  also. 

An  additional  feature  of  our  model  Is  that  it  gives  zero-one 
estimates  of  a zero-one  process.  Since  z(t)  is  binary  It  is  not 
meaningful  to  have  fractional  estimates  of  z(t).  For  example,  it  Is 
meaningless  to  say  2(t)  * 0.73  (though  it  is  meaningful  to  say  that  the 
probability  of  z(t)  a 1 is  0.73  ). 

The  only  restriction  in  our  model,  which  does  not  appear  in  the 
Wiener  filter,  is  that  the  process  is  assumed  to  be  Markov  of  a given 
order  n.  A process  is  called  Markov  if  the  probability  distribution  of 
z(t)  given  all  the  past  history  of  the  process  depends  only  on  a finite 
past.  In  particular,  z(t)  is  Markov  of  order  n if 

P[z(t ) ! z( 1) ,z(2) , . . . ,z(t«l )]  a P[z(t) iz(t-n) ,z(t-n+1) z(t-1)] 

Here  PC . 3 denotes  the  probability  of  an  event. 

In  this  chapter  we  develop  a general  probabilistic  model  relating 
z(t)  to  its  past  values.  Based  on  this  model,  an  expression  is  derived 
for  the  likelihood  function,  and  hence,  for  the  maximum  likelihood 
estimates  (MLE)  of  the  model  parameters.  We  show  how  the  model  is  used 
for  optimum  prediction  and  derive  a formula  for  the  total  cost  due  to 
prediction  errors.  Then  we  extend  all  results  to  the  more  general  case 
of  k-ary  processes.  In  this  case,  the  process  takes  integer  values  from 
0 through  k-1.  Finally,  we  show  how  the  model  can  be  used  for  page 
replacement . 
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In  the  analysis  presented  in  this  paper  we  make  frequent  use  of 
the  properties  of  pseudo-Boolean  functions.  The  essential  elements  of 
the  theory  of  such  functions  are,  therefore,  briefly  reviewed  in  the 
next  section  (adopted  from  [HaR68]).  The  material  in  the  other  sections 
of  this  paper  is  original  and,  as  far  as  is  known  to  the  author,  has  not 
appeared  anywhere  in  the  published  literature. 

» 
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JL&  mim  FUNCTIONS  - mBABSttlALS 

The  definition  of  Boolean  functions  varies  widely  among  authors. 
In  an  attempt  to  generalize  the  concepts,  even  the  pioneers  of  this 
theory,  Rudeanu  and  Hammer,  have  changed  the  definition  over  time  (e.g., 
in  [HaR68],  and  [Rud74]).  In  this  thesis,  we  adopt  the  following 
definition  from  [HaR68]. 

4.2.1  Definition  : By  a "Boolean  function"  f(*1 ,X2, . . . ,xn)  of  n 
variables  we  mean  a mapping 

f : {0 , 1 }n— >{ 0,1) 

i.e.,  a zero-one  valued  function  of  zero-one  valued  variables. 

An  example  of  a Boolean  function  is  f(x1fx2)  t xp  X2  • 2x^X2. 
The  usual  way  to  express  a Boolean  function  is  by  using  the  Boolean 
operations  (e.g.,  conjunction,  disjunction,  and  negation).  For  example, 
the  above  function  is  usually  written  as 

f*x1»x2)  * 5^2  v x1?2 

where  "v"  is  the  disjunction  (inclusive  OR)  operator,  bar  indicates 
negation,  and  conjunction  is  denoted  by  Juxtaposition.  The 
transformation  between  the  two  representations  is  a result  of  the 
following  equivalences: 

x * 1 — x vx  6(0,1) 


*1  v X2  t x^  ♦ X2  • X^X2  *Xi,X2  6(0,1} 
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A notation  which  Is  commonly  used  In  the  literature  on  Boolean 
functions  is  the  following 

x°  a x xi  a x 

Where  xO  is  "x  sup  zero"  (not  x raised  to  the  power  zero).  To  avoid 
confusion,  we  will  use  (x)i  to  denote  the  ith  power  of  a binary  variable 
x.  Continuing  with  the  notation,  if  X a {x^ Xnj  i8  a 8,t  of  n 

binary  variables  and  i1i2...in  is  the  n digit  binary  representation  of 
i)  Oi  i i2n-l,  then 

VX)  a X1  * X1ilX21*  ...  Xnln 
is  called  the  ith  fundamental  product.  For  example,  for  na3 
^5  * *1  *2°x3  * x1*2*3  an<*  * *1^x2^x3^  * *1*2*3 


An  important  property  of  fundamental  products  is  that  q^xjal  if 
and  only  if  Xal.  Thus,  the  fundamental  products  are  "mutually 
exclusive",  i.e., 


iWJ 

i*J 


There  are  many  ways  of  representing  a Boolean  function.  A few 

t. 

examples  are  given  below: 

1 " X1  " *2  ♦ 2x^2  (Polynomial  form) 

*1*2  v *1*2  (Disjunctive  Form) 

^X1  » x2)(Si  v xg)  (Conjunctive  Form) 

1#*iex2  (Reed -Muller  form) 

*1^2  * °*1*2  ♦ 0*1*2  ♦ *1*2  (Sum  ft'Oduct*  form) 


T 


? 


t 


I 


-» 


t 


I 
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Boolean  Functions 

In  our  analysis,  we  use  the  sum  of  products  fora.  Using  Shannon's 
decomposit ion  theorem,  any  Boolean  function  can  be  expressed  in  this 
fora  as  follows: 

tl 

fiQi(X) 
i»0 

where,  , f(X|X*i),  i.e.,  f(X)  when  Xj.ij,  x2.i2,  ....  xn*in. 

The  concept  of  Boolean  functions  can  be  generalized  to  other 
functions  - not  necessarily  zero-one  valued.  Such  functions  are  called 
"pseudo-Boolean  functions”. 

4.2.2  Definition  : Let  R be  the  field  of  real  numbers;  by  a 
pseudo-Boolean  function  f we  mean  a mapping 

f : { 0 , 1 >n  ->  R 

i.e.,  a real  valued  function  of  binary  variables. 

An  example  of  pseudo-Boolean  functions  is  the  following  function  : 
f(x1»x2)  . 0.5(x;)3  ♦ 3x,  _ 2(x2)2 

In  fact,  all  functions  (Including  Boolean  functions)  of  binary  variables 
are  pseudo-Boolean  functions.  Therefore,  the  adjective  "pseudo-Boolean" 
say  be  dropped  whenever  it  is  dear  from  the  context. 

Again  uaing  Shannon's  decomposition  theorem,  any  function  of 

binary  variables  can  be  reduced  to  a "sum  of  products  fora": 

1 

ftqi<X) 

For  example, 

f(x1,x2)  ■ 1 - 0.5(x1)3  ♦ 3xiX2  - 2(v?)i 2 

For  this  function  fQ  ■ 1,  ft  s -1,  fa  e 0.5,  and  f3  * 1.5,  hence, 

i \ 

\ 
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f*x1»x2)  * l3flx2  ♦ (-D*iX2  ♦ 0. 5x^X2  ♦ 1. 5x^2 

Similarly,  x^x^  , OX.^  + 1x^2  ♦ 0xix2  ♦ ex^2 

Notice  that  when  expressed  In  the  sum  of  products  form,  every 
function  of  binary  variables  becomes  linear  in  each  variable  (i.e.,  each 
variable  appears  only  as  its  first  power),  although  the  function  Itself 
is  non-linear  (due  to  the  presence  of  product  terms). 
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Let  ZJi  denote  the  set  of  1 observations  immediately  preceding 
z(J),  i.e.,  {*(  J-i) , z(J-it-l),  ....  z(  J-1 ) } . Thus 

ztt-1  s {z(1),z(2),...z(t-1))  denotes  the  complete  past  history  of  the 
process.  Let  pt  , p[s(t)sl!s(1),a(2),...fi(t-1)]  denote  the  probability 
of  z( t ) s 1 given  the  past  history  of  the  process.  The  simplest  binary 
process  is  the  so-called  "Binary  White  Noise"  (BWN)  or  Bernoulli 
Process.  It  is  defined  as  the  sequence  of  independent  Identically 
distributed  binary  random  variables.  The  semi-random  telegraph  signal 
described  previously  is  a BWN.  Also  if  we  associate  a time  index  to 
successive  Bernoulli  trials,  they  will  constitute  a BWN.  A BWN  can  also 
be  obtained  by  filtering  and  clipping  a Gaussian  white  noise.  A BWN 
with  parameter  p will  be  denoted  by  BWN(p) 

For  a Markov  process  of  order  n,  pt  depends  only  on  the  past  n 
values  Ztn,{z(t-n),z(t-n«-1),...,z(t-1)}.  We  can  represent  the  most 
general  non-linear  dependence  of  pt  on  ztn  by  saying  that  pt  * h(Ztn), 
where  h is  some  non-linear  function  of  Ztflf  SUCh  that  0£  h £1.  In  the 
sum  of  products  form,  we  have 

tl 

li4i(Ztn)  [4.1] 

isO 

-here  hi  . h(Ztn|Ztn.l) 

* Value  of  h(Ztn)  when  z(t-n) , . . . ,z(t-1 ) take  values 
corresponding  to  the  binary  expansion  of  1. 

*nd  qi^ztn^  ■ l*h  fundM,ental  product  of  z(t-n),...,z(t-1) 
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The  equation  for  z(t)  corresponding  to  this  equation  is 


tl 

e 


i(t)qi(Ztn) 


[4.2] 


where,  ei(t J-BWfKh^ . 


By  taking  expectations  of  both  sides  of  equation  (4.2)  , it  can  be 

shown  to  be  equivalent  to  equation  (4.1).  Notice  that  Ztn  denotes  the 

"state"  of  thij  process.  The  process  can  be  in  any  one  of  2"  states 

corresponding  to  i«0, 1 , . . .2n-1  • The  distribution  of  the  future 

value  z(t)  in  state  i is  Bernoulli  with  parameter  h^. 

For  example,  the  Boolean  model  of  a second  order  Markov  process  is 
*(t)  * «Q(t )S(t-2) J(t-1 ) ♦ e1(t)J(t-2)z(t-1)  ♦ e2(t)z(t-2)S(t-1) 

♦ e3(t)z(t-2)z(t-1) 
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The  proposed  model  (equation  4.1  or  4.2)  has  2n  parameters 

h0t*Mf*»h2n-1*  In  this  section  we  develop  a likelihood  function  for 
the  observations,  and  find  the  expression  for  maximum  likelihood 

estimates  of  these  parameters.  To  develop  the  result  in  the  form  of 
Theorem  4.4.2,  we  need  the  following  lemma. 


4,4,1  Lemma  f Fund  ion  Lemma)  : Let  f be  a mapping  f :R->R, 
i.e.,  a real  valued  function  of  a real  variable,  and  let 
2n.l 

P 8 2-hiqi(X) 
isO 


1 

>.  «p)  * x 


f(hi)qi<X) 


Proof  : Let  p a h(X) 

so  that  f(p)  ^ f(h(X)) 

Since  the  right  hand  side  of  the  above  equation  is  a pseudo- Boolean 
function,  it  can  be  written  in  a sum  of  products  form: 


f(b(X»  * f(h(X|X. 


f(h(X|X.i))qt(X) 


22=1 

* X f(hi)q1(X) 


[Q.E.D. ] 
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Some  examples  of  the  use  of  the  Function  Lemma  are  given  below. 


tl 

hi2«i<X> 

1=0 


1 22s1 

p"  3 L hi'Vx) 


22s1 

logd-p)  = Xlog(l-hi)  Qi(X) 


4.4.2  Theorem 


Theorem!  : The  maximum  likelihood  estimate  of 


based  on  N observations  (z(1),  z(2),  . ..,  z(N )}  is  given  by 


“iO  + m11 


i«0,1,...2n-1 


where  m^0  = # of  times  the  sequence  Ztn*i  is  followed  by  z(t)=0 

and  a. . a # of  times  the  sequence  Z.  >1  is  followed  by  z(t)a1 


Pr°of  ! ^ H a (h0,h1,...,h2n_1)  be  the  set  of  parameters. 


Pt  . Ptz(t)a1|Ztn,H]  a ^ 


»i^i<2tn> 


Pt  « 1-pt  a P[zCt).0|ZtB,H] 

The  above  two  equations  for  pt  and  ^ oan  be  oomblned  as  follows: 

P[z(t)!Ztn,H]  a P[z(t)a1|Ztn,H],(t)«*(t)«0lZtn,Hli(t) 


Therefore , 


P[z(N)  ,z(N-1 ) z(  1 ) |z(-n+1 ) ,z(-n*2) , . . . ,z(0)  ,H] 

« P[z(N) ' z(N-1 ) *(D.Z1n>H]P[z(N-1)|z(n-2) z(  1) ,Z1n,H]. . , 

. . .P[z( 1 ) I Z^ntH] 
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* 


■ it 

tsl 


P[z(t ) !Ztn,H] 


The  above  equation  gives  tne  likelihood  that  the  N observations  came 

fro.  . aodel  with  poramet.r  H , Oi0,h,,...h2n.,).  Hollo.  that  «e  aaaumo 
the  initial  conditions  z(-ni-1 ),..., z(0)  to  be  given  (or  to  be  assumed 
equal  to  zero).  Only  the  parameters  are  to  be  estimated.  The 
likelihood  function  is 


L(H) 


* "ft  Pt*(t)p  i(t) 

tsl  1 1 


taking  the  log  of  the  above  equation  we  get  the  log  likelihood  function 
1(H)  « log{L(H) ) 


t 


* (z(t)log  Pt  ♦ J(t)log  p. ) 

tsl  1 


Now  using  the  Function  Lemma, 

3n-l 


t 

tsl 


*(t)log  pt 


* <l0s  hi)qi(ztn: 

tsl  isO 


2Elog(hi)J^z(t)q1(Ztn) 

isO  - - 


tsl 


tl 

' 

isO 


1lllog(h1) 


where 


■il 


* £ z(t)q1(Ztn) 
tsl 


s # of  times  ^tn*1  followed  by  z(t)sl 
The  last  equality  is  a result  of  the  observation  that  *(t)q^(ztn)  i#  i 
if  and  only  if  z(t).1,  and  l mi,  Similarly, 


i 
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(t)log  p. 


-1 


m10log(h1) 


t»l  v i,0 

Therefore,  the  log  likelihood  function  is  given  by 


1(H) 


V1 

* L.  ♦ mi0log(1-h1)} 

i=0 


The  maximum  likelihood  estimate  of  hj^  is  obtained  by  setting  the  first 
derivative  of  the  log  likelihood  function  equal  to  zero,  i.e., 

dl  Bi1  ®i0 

---  s Os  — - - — - 

dl^  h 1-h1 


or  h.  = 

i a 


“il 
iO  * mi1 


[Q.E.DJ 


J 


e s . 

>i  u 
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4.5  MEASURES  GOODNESS 


In  the  oase  of  Gaussian  randoa  variables  it  is  common  to  define 
the  "best  estimate"  in  the  least-squares  sense  (LSE),  i.e.,  2 is  the 
best  estimate  of  z if  E[(z-2)2]  is  minimum.  In  the  case  of  binary 


variables,  the  role  is  played  by  what  we  propose  to  call  the  "Least  XOR 
Estimate"  (LXE) , and  the  "Least  Cost  Estimate"  (LCE). 


3.1  Least  XOR  Est lmat e : Since  both  z and  2 can  take  only  two  values, 
there  are  only  4 cases  to  be  considered  as  shown  below.  Here  e is  used 
to  denote  the  error  variable. 

z 2 Error  e 


0 0 No  0 

01  Yes  1 

10  Yes  1 

11  No  0 

i 

It  is  easy  to  see  from  the  above  table  that  e • ze2  (exoluslve-or 
of  z and  2).  The  minimum  number  of  error  cases  will  be  obtained  if 
E(zef]  is  minimum.  The  estimate  I whioh  minimizes  B[zel]  is  the  least 
XOR  estimate  (also  the  least  error  estimate).  It  is  easy  to  verify  that 
LXE  is  equivalent  to  LSE  for  binary  variables,  i.e.,  (z-2)2  • ze2. 

3.2  Least  Cost  Estimate  : In  formulating  LXE,  it  was  assumed  that  both 
kinds  of  error  zs1,2«0  and  zs0,2«1  are  equally  costly.  In  the  signal 
processing  area,  these  two  errors  are  oalled  "miss  signal"  and  "false 
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alarm"  respectively.  The  cost  of  these  two  types  of  errors  Is  generally 
different.  For  example,  In  the  case  of  weather  prediction,  the  cost  of 
predicting  a storm  and  not  actually  getting  one  is  quite  different  from 
that  of  getting  an  unpredicted  storm.  Similarly,  in  the  case  of  memory 
management,  the  cost  of  a page  fault  (miss  signal)  is  not  always  the 
same  as  the  cost  of  keeping  an  unused  page  for  some  time  (false  alarm). 

In  such  cases,  therefore,  we  propose  a generalized  concept  to  be 
called  the  "least  cost  estimate"  or  LCE.  In  this  oase,  the  cost 
function  C(z,2)  is  a given,  not  necesarily  linear,  function  of  z and  2. 
Now  by  Shannon's  decomposition  theorem,  we  can  express  C as  follows  : 

C(z,2)  = cq2!  ♦ Clf2  ♦ o2zi  ♦ e3t2 
Where  cQ  a C(0,0),  c,  . C(0,1),  e2  . C( 1 ,0) , c3  * C(1,1). 

Here  c 2 and  Cj  are  the  costs  of  a miss  signal  and  a false  alarm 
respectively.  Without  loss  of  generality,  we  can  assume  that  c0«e3«0. 
This  is  because 

C(z»2)  » (o02  ♦ o3z}  ♦ {(crc0)22  ♦ (c2-e3)zJ  > 

The  part  within  the  first  set  of  braces  is  independent  of  2,  and  hence 
the  problem  is  equivalent  to  one  with  cost  of  miss  signal  e2-c3,  cost  of 
false  alarm  ^-cq,  and  zero  cost  for  correct  prediction. 


* ' I 

I 

j $ 
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We  now  return  to  our  original  problem  of  finding  the  Boolean 
function  g,  such  that  the  estinate  2(t)  = g(Z^)  minimizes  a given  cost 
function.  The  prediction  method  that  we  are  going  to  describe  is  based 
on  the  two  theorems  below. 

4.6.1  Theorem  [Prediction  Theorem!  : Given  the  model  relating  z(t)  to 
Ztn 

tl 

ei(t)qi(Ztn) 

1*0 

the  estimate  2(t)  which  minimizes  the  expected  value  of  cost  function 
C(z(t),2(t))  for  N observations  is  given  by 

tl 

*i<U<Ztn> 

iaO 

where  i*0, 1,..,,2n-1  are  zero>one  valued  variables  chosen  as 

follows  : 

^ 1 9 if  h^  ^ p 

*1  • <|  0,  if  hi  < r 

\ d,  if  ht  . r 

^ °1 
where  r ■ — 

°1  ♦ °2 

and  d represents  a "don't  care"  condition,  l.e.,  either  0 or  1 would  do 
equally  well. 
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Proof  : Let  the  desired  estimate  be 

*«>  • s<ztn> 

wh,r#  ,<2tn>  1*  • Boolean  funetlon  or  Jtn.  Again,  using  tha 

Decomposition  Theorem,  we  have 

tl 

^ifli(Ztn) 

1*0 

where  8^  la  a Zero-one  valued  variable  given  by  8A  , g(ZtnIZtn*i) • 

Since  z(t)  * ^#1(t)Ql(ztn) 

1*0 

the  exclusion  property  of  the  fundamental  products  enables  us  to  write 
the  cost  function  as  follows: 

C(z(t) ,2(t))  . clZ(t >2(t)  ♦ c2*(t)!(t) 

t1  25s1 

«i(t  )^i<Ji(Ztn)  ♦ C2  ei(t  )8idi(Ztn) 

1*0  1*0 

* {e^i^^i+^aiCt  )Ii)qB(Ztn) 
i*0 

Taking  expectation,  we  have 

E[C(z(t),2(t))]  * *£  (c^jl^hJiJECq^Ztn)] 

i*0 

tl 

C(h1,81)E[qi(Ztn)] 

1*0 

Thus  we  have  decomposed  the  expected  cost  into  2n  small  components  each 
of  which  can  be  independently  optimized.  Consider  the  ith  component 


'i.^i): 


C(hi»#1)  * c1^i*i  ♦ c2Mi  ■ °2*>i  ♦ •i(«»i  - (e,  ♦ cgjh^ 
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The  last  expression  is  linear  in  3^.  jt  is  minimum  if  eaoh  is  chosen 
as  stated  in  the  theorem. 

[Q.E.O.] 

Notioe  the  similarity  in  expressions  for  z(t)  and  4(t).  The 
expression  for  4(t)  can  be  obtained  from  that  for  z(t)  by  replacing 
ei(t)  by  8^.  In  faot,  3*  is  the  best  estimate  of  the  binary  white  noise 
®i(t)  if  the  cost  function  is  C(e^(t),8^). 

Jk&il  Theorem  C Total  Theorem  1 : The  total  cost  of  imperfect 

prediction  for  N observations  by  using  the  Prediction  Theorem  is 


Proof  : 


TC 


TC 


. 


i.0 

t 


min(02Bi1*  Ol®io) 


j.1 


C(z(J),f(J)) 


J«1 


Mow  ®iQi<zjn) 

J.l  j.l  i.0  x J 


1.0  j.l 
1 

*i“iO 

I 

suiurir.  £ j)  . 

J.l  i.0  11 

tl 

(ei*i®iO  ♦ °2^i“iO 


t' 

1.0 


Henoe, 


1*0 


i 
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This  method  is  a result  of  combining  the  three  theorems  described 
before,  viz.,  the  Estimation  Theorem,  the  Prediction  Theorem,  and  the 
Total  Cost  Theorem.  The  method  consists  of  the  following  steps  : 

1.  Summarize  the  observed  data  in  terms  of  frequency  of  occurrence  of 
various  fundamental  product  terms.  The  summary  is  arranged  in  a 
tabular  form  as  shown  in  Table  4.1  . The  table  has  2n-1  rows  and  5 
columns.  The  columns  are  named  Z,  M,  H,  3,  and  TC  respectively. 

2.  The  Z column  consists  of  n subcolumns  corresponding  to  n variables 
z(t-n) ,z(t-n+1 ),..., z(t-1).  The  ith  row  in  this  column  is  simply 
the  n digit  binary  expansion  of  i-1. 

3.  The  M column  consists  of  2 subcolumns  corresponding  to  z(t)«0,  and 
z(t)*1  respectively.  The  entry  in  the  ith  row  of  the  first 

subeolumn  is  m^f  i.®.t  the  number  of  observations  with  z(t)«0  and 
x 

tn«l.  Similarly,  the  entry  in  the  second  subcolumn  is  the  number 
of  observations  with  z(t)«1  and  Z^.t. 

4.  The  entries  in  the  h^  column  are  obtained  from  those  in  the  M 
column  as  follows  : 

h JJL. 

1 * "iO  ♦ "11 

entry  in  the  z(t)«1  subcolumn 
sum  of  entries  in  the  z(t)a1  and  z(t)«0  subcolumns 

5.  The  entries  in  the  i column  are  either  0,  1 , or  d according  as  h^ 
is  less  than,  greater  than,  or  equal  to  the  ratio  r a o1/(c1  + c2). 


4-24 

Boolean  Models 
Tabular  Method 


If  in  a particular  row,  both  *l0(  ^ Bl1  BPe  IePOf  the  « entPy  in 
that  row  is  d. 

6.  The  entries  in  the  TC  column  are  calculated  according  to  the  Total 
Cost  Theorem,  i.e.,  the  ith  entry  is  min(e2Bl1i  OlBl0), 

7.  Synthesize  the  Boolean  funetion  represented  by  the  8 column.  This 

is  the  optimum  predictor.  In  sum  of  produot  form  the  function  is 
.imply  X4l,i<Ztn). 

a 

8.  The  goodness  of  fit  is  given  by  the  total  cost  calculated  by 
summing  up  the  TC  column. 

We  now  illustrate  the  method  with  an  example. 

4.7.1  Example  : The  data  oonsista  of  144  observations  on  a 4th  order 
binary  process.  The  actual  observations  have  not  been  inoluded  here, 
instead,  the  frequency  of  occurrence  of  the  various  combinations  is 
presented  in  Table  4.2.  The  oost  of  a false  alarm  is  twice  that  of  a 

miss  signal,  i.e.,  o^m2  and  c2*i.  The  ratio  r • ---  • -.  The  ht  column 
is  constructed  as  usual.  The  entries  in  the  8 column  are  1 or  0 
according  as  the  entries  in  the  oolumn  are  greater  or  less  than  2/3. 
Two  of  the  h^tg  tre  sxactly  equal  to  2/3.  Hence,  the  8 entries  in  these 
rows  are  "don't  oare"  entries  marked  as  d1  and  d2  respectively.  The 
predictor  corresponding  to  d^^oo  i* 

8(t)  » *(t-4)*(t-3)*(t-2)*(t-1)  ♦ z(t-4)z(t-3)z(t-2)z(t-1 ) 

♦ z(t-4)*(t-3)*(t-2)z(t-1)  ♦ z(t-4)*(t-3)z(t-2)*(t-1) 

♦ z(t-4)*(t-3)*(t-2)z(t-1)  ♦ z(t-4)z(t-3)z(t-2)*(t-1) 

♦ z(t-4)z(t-3)*(t-2)z(t-i) 
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. 1 - - z(t-3)  - z(t-4)  + z(t-1 )z(t-3)  ♦ 2z(t-l )z(t-4) 

♦ z(t-2)z(t-4)  ♦ 2z(t-3)z(t-4)  - z(t-1 )z(t-2)z(t-4) 

- 3*(t-1 )z(t-3)z(t-4)  - 2z(t-2)z(t-3)z(t-4) 

♦ 3z(t-1 )z(t-2)z(t-3)z(t-4) 

Similar  equations  can  be  written  for  3 other  equally  good  predictors 
corresponding  to  d^.QI , 10,  11.  All  these  predictors  give  the  same 

total  cost  of  50.  B 
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TABLE  4^1  : Tabular  Arrangement  for  Boolean  Model 


z(t-n) 

• • • 

z(t-2) 

z(t-1) 

# of  obsv 
z(t)»0 

. with 
Z(t)s1 

hi 

Si  TCi 

0 

• • • 

0 

0 

a 

o 

o 

“01 

"01 

m00+m01 

• • • • • • 

0 

• • • 

0 

1 

m10 

m11 

• • • 

• • • • • • 

0 

• • • 

1 

0 

o 

CM 

B 

m21 

• • • 

• • • • • • 

* 

0 

• • • 

• • • 

1 

1 

"30 

m31 

• • • 

• • • • • • 

1 

• • • 

• • • 

1 

1 

“2n-1,0 

®2n-1 

,1  ••• 

i 

• • • • • • 
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, 

i 

L 


1 

i 


TABLE  4.2  : Frequency  Distribution  for  Data  of  Example  4.7.1 


z(t-4) 

z(t-3) 

z(t-2) 

z(t-1 ) 

m 

iO 

mi1 

hi 

e min(2mlo,  mi!) 

0 

0 

0 

0 

1 

9 

0.90 

1 

2 

0 

0 

0 

1 

8 

2 

0.20 

0 

2 

0 

0 

1 

0 

3 

8 

0.73 

1 

6 

, 0 

0 

1 

1 

7 

1 

0.13 

0 

1 

0 

1 

0 

0 

3 

2 

0.40 

0 

2 

0 

1 

0 

1 

9 

7 

0.44 

0 

7 

0 

1 

1 

0 

2 

4 

0.67 

d1 

4 

0 

1 

1 

1 

6 

0 

0.00 

0 

0 

1 

0 

0 

0 

5 

3 

0.38 

0 

3 

1 

0 

0 

1 

1 

8 

0.89 

1 

2 

1 

0 

1 

0 

2 

9 

0.82 

1 

4 

1 

0 

1 

1 

0 

7 

1.00 

1 

0 

1 

1 

0 

0 

2 

8 

0.80 

1 

4 

1 

1 

0 

1 

7 

5 

0.42 

0 

5 

1 

1 

1 

0 

1 

2 

0.67 

d2 

2 

1 

1 

1 

1 

3 

9 

0.75 

1 

6 

Total  Cost  * 50 
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JUS.  mmULZmSS.  12  K-ABy  variables 

In  this  section  we  generalize  the  analysis  done  so  far  to  the  case 
where  the  process  may  take  k values  0,1,..., k-1.  To  do  this 
generalization,  we  use  the  concept  of  Boolean  functions  extended  for 
k-ary  variables.  This  concept  is  due  to  Rosenberg  [HaR68,  p.  301]. 


Let  Bk  * {0,1,...,k-1}.  A Boolean  function  is  now  defined  as  a 
mapping  f : and  a pseudo-Bool ean  function  as  f : (Blt)n*>B. 

For  any  x6Bc->bc-»  ( we  define  the  so  called  "Lagrangean  Development"  x1  (x  *»up 

i)  as  : 

k 

xi  _ x(x-l2..i(x-i+1)(x-i-1)...(x-k4>1) 

T(T-T)7.”7-™ 

mapping  B^  into  Bg.  For  example,  when  ka3: 

x°  8 £(x-1)(x-2)  x1  » -x(x-2)  x2  « ~x( x— 1 ) 

Notice  that 

/I  if  x«i 

X*  a <' 

\0  otherwise 

Let  i1*i2*****in  be  th®  k-*ry  expansion  of  1,  and  X • {x.,  ,x2, . ..xn) 
then  Xi  . qi(x)  , x1ilx2i2  ...  x0in 

a 1th  Fundamental  product 


Any  pseudo-Boolean  function  has  a Lagrangean  development  (sum  of 
products  form)  : 


w,B-l 

» L.  fdJX1  ■ 

laO 


lejsl 
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Again, 


qi(X)  » <. 


/ 1 if  X«i 


\ 0 otherwise 


Therefore,  the  fundamental  products  are  mutually  disjoint,  i.e., 

( 0 i*J 

9l(X)qj(X)  . < 

(qi(X)  i.J 

So,  the  Function  Lemma  holds,  i.e.,  if  f is  a real  valued  function  of  a 


real  variable,  and 


if  P 


* t i 


hiqi(X) 


«,>  - 1 


f(hi)qi(X) 


4.8.1  Model  : The  relationship  between  z(t)  and  Ztn  in  it8  ao8t  general 


form  is  given  by 


tl 

•i(t)9i(Ztn) 


where  «1(t)  is  a k-arv  white  noise  (sequence  of  independent  identically 
distributed  random  variables)  with  PCe^t )*u]*hlu.  Hence, 


tl 

hiUqi(ztn>»  u.0,1 k-1 

i«0 

There  is  an  additional  constraint,  however,  that 


¥ 

L.P ut  ■ i 


This  constraint  implies  that 
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t’ 


hiu  « 1 


1*0,1 kn-1 


With  this  model,  all  the  results  of  the  binary  case,  viz.. 
Estimation  theorem,  Measures  of  goodness,  Prediction  theorei,  and  Total 
Cost  theorem,  can  be  generalized  to  the  k-ary  ctse.  These 
generalizations  are  stated  below.  The  proofs  of  tha  theorems,  being 
similar  to  the  binary  case,  are  given  In  Appendix  C. 


^‘8.2  Estimation  Theorem  : The  maximum  likelihood  estimates  of  h^ 

based  on  W observations  {z( 1 ) ,z(2) , . . . ,z(N)}  are  given  oy 
alu 


iu  8 k“-'l 


t 

u*0 


u*0,1,...,k-1 


'lu 


where  m^u  * # of  times  Z^n*l  Is  followed  by  z(t)su. 


MtaaarTtf  aL  Goodness  : In  the  case  of  k-ary  variables  the  least 
cost  jstimmte  t(t)  Is  obtained  by  minimizing  a general  cost  function 
C(z,2).  The  function  can  be  expressed  In  the  sum  of  products  form  as 
follows  : 


«»•«  • 

u*0  v*0 

where  cuy  a c(u,v)  * cost  of  misprediction  when  zau  and  £*v.  it  is 
often  easier  to  specify  C as  a k by  k matrix  whose  (u*i,v+l)th  element 

°uv»  A special  case  of  the  least  cost  estimate  occurs  when 

«».*>  . 1 

J*o 
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\ 

t 


It  Is  easy  to  verify  that  in  this  case, 

/0  if  usv 

cu»  • <1 

\1  otherwise 

i.e.,  all  errors  cost  the  same.  Thus,  by  minimizing  this  cost  function, 
one  obtains  the  least  number  of  errors.  The  estimate  obtained  could  be 
called  "Least  XOR  Estimate",  because  of  the  form  of  the  cost  function. 
In  general,  LXE  is  not  the  same  as  LSE  except  in  the  binary  case. 

l&iLii  Prediction  Theorem  : Given  the  model  equation 

tl 

•i<t)qi<Ztn> 

i*0 

The  estimate  2(t)  which  minimizes  the  expected  value  of  the  cost 
function  C(z(t),2(t))  is  given  by 

• *“>  ■ kE*iqt«tn> 

i«0 

where  iif  i*o, 1 , . . . ,kn-1  are  k-ary  variables  chosen  as  follows  : 

*1  ■ arg  "in^euvhiu 

usO 


* arg  min^5,  „ _ 
v ®uv“iu 
u«0 

*■8.4.1  Corollary  : The  least  XOR  estimate  (ouy*i,  Urfv)  la  given  by 
*i  * ***g  mlv 

*»8»5  Total  Cost  Theorem  : Given  a set  of  N observations  on  z(t),  the 
total  cost  of  Imperfect  prediction  by  using  the  Prediction  Theorem  is 

TC . “r1  y1 

m^n  ®lucuv 
i«0  UaO 
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4.6,5. 1 Corollary  s The  total  cost  in  the  case  of  LXE  is  given  by 

tl 

.4*  .tv 

i*0 

4.8.6  Tabular  Method  : The  method  is  very  similar  to  that  for  the 
binary  case.  The  only  addition  is  an  MC  column,  which  is  obtained  by 
post -multiplying  the  M column  by  the  C Matrix.  We  illustrate  the 
procedure  with  an  example. 

4.8.6. 1 Example  : Consider  the  problem  of  predicting  a ternary  process 
z(t)  of  2nd  order.  A total  of  137  observed  values  of  the  process  are 
available.  The  data,  summarized  in  tabular  form,  ar  shown  in  Table  4.3. 
The  cost  function  is 

C(z(t),2(t))  . |z(t)  - 2(  t ) i 
Therefore,  the  cost  matrix  is 
0 1 2 

C * 1 0 1 

2 1 0 

— 

The  calculations  are  shown  in  the  table.  The  MC  column  is  obtained  by 
post -multiplying  the  M column  by  the  C matrix.  Notice  from  the  table 
that  in  the  last  row,  two  MC  entries  are  equal..  Therefore,  the 
corresponding  i entry  is  dQ1 t which  stands  for  "don't  care  as  long  as  it 
is  0 or  1".  The  optimum  regression  function  corresponding  to  dQ1,o  is 
2(t)  . z0(t-2)z0(t-i)  ♦ z0(t-2)z1(t-1)  ♦ z0(t-2)z2(t-1> 

♦ 2z1(t-2)z0(t-1)  ♦ 2z1(t-2)*1(t-1)  ♦ 2*1(t-2)z2(t-1) 

» z°(t-2)  ♦ *1(t-2) 

■ - |(z(t-2))2  ♦ |z(t-2)  ♦ 1 
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An  equivalent,  rather  simple,  expression  for  the  above  4(t)  is 

2(t)  > 1 ^ z(t-2)  where  +3  denotes  "addition  modulo  3". 

A seoond  predictor,  corresponding  to  dolaif  canf  similarly,  be 
written.  The  total  oost  in  either  case  is  101. 
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TABLE  4.3  : Boolean  Predictor  for  Data  of  Example  4.8.6. 1 


z(t-2) 

z(t-1) 

-to 

“11 

“12 

biO 

h11 

h12 

MC0  MC^ 

MCg 

*1  TCi 

0 

0 

8 

3 

6 

0.47 

0.18 

0.35 

15 

14 

19 

1 

14 

0 

1 

5 

6 

7 

0.28 

0.33 

0.39 

20 

12 

16 

1 

12 

0 

2 

2 

5 

6 

0.15 

0.38 

0.46 

17 

8 

9 

1 

8 

1 

0 

5 

1 

7 

0.38 

0.08 

0.54 

15 

12 

11 

2 

11 

1 

1 

9 

1 

11 

0.43 

0.05 

0.52 

23 

20 

19 

2 

19 

1 

2 

3 

4 

8 

0.20 

0.27 

0.53 

20 

11 

10 

2 

10 

2 

0 

7 

2 

2 

0.64 

0.18 

0.18 

6 

9 

16 

0 

6 

2 

1 

9 

4 

5 

0.50 

0.22 

0.28 

14 

14 

22 

doi 

14 

2 

2 

6 

3 

2 

0.55 

0.27 

0.18 

7 

8 

15 

0 

7 

101 
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1x1  9H  aSBBL  QBPSB  PEIBBMMIAIIQN 

In  *ie  theory  that  we  have  developed  so  far  we  have  assumed  that 

the  model  order  n is  known.  In  practice,  this  may  not  always  be  true. 

In  the  case  of  Gaussian  processes  there  are  many  criteria  and  tests 
(e.g.,  see  [Aka74]),  that  allow  us  to  determine  an  optimal  model  order 
from  empirical  data.  The  corresponding  results  for  Boolean  models  are 
yet  to  be  developed.  Some  rudimentary  ideas  on  this  problem  are 
presented  in  this  section. 

It  should  be  obvious  that  the  prediction  error  (or  the  total  cost 

of  prediction)  monotonically  decreases  as  the  model  order  is  Increased. 

A quantitative  formula  for  the  increase  in  error  is  given  by  the 
following  theorem. 

4.9.1  Theorem  : The  Increase  in  cost  in  going  from  a (n*1)st  order 

model  to  nth  order  model  is  given  by: 

tl 

[ min{  02(®ii**i»  i ) » Qi(BiO+mi,0^ 

UO 

- min(Cj>Bl1f  c^mio)  - min^m^t^,  cim^to)  1 
where  the  m-values  are  for  the  (n+1)**  order  model,  and  i'»2n«-l. 

Proof  : Let  m'  denote  the  m-values  for  the  nth  order  model.  For 

example , 

■'ll  « # of  times  Ztn  • i is  followed  by  x(t)al 

a # of  times  z(t-n-1)a0,  Ztn,i  i„  followed  by  z(t)a1 

. 

♦ # of  times  »(t-n-1)al,  Ztn,i  l8  followed  by  z(t)ai 
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* # of  times  a i is  followed  by  z(t)*l 

■♦■  # of  times  , 2n+i  is  followed  by  z(t)*1 

8 mi1  ♦ mi«i 

Similarly,  m'..  _ 

10  * ■i0'WBi,0 

tl 

min(c2n,n»  c^rn'to) 
i»0 

2n±I-1 

and  TC(n+1)  - ^Bin(c2«ii»  cimio) 
isO 

1 

[min^mfi , c^mjo)  minCcjm^i^,  Cim^ig)] 
irO 

Notice  that  in  the  last  equation  the  upper  limit  of  the  summation  is 
instead  of  2n+^-1.  The  difference  of  the  above  two  equations  gives 
the  theorem  as  stated.  •! 

[Q.E.D.] 

There  are  two  implications  of  this  theorem.  Firstly,  each 
summation  term  is  of  the  form  "the  minimum  of  sums  minus  the  sum  of 

I 

minima".  Hence,  each  term  is  non-negative.  This  proves  the  statement 

that  the  cost  monotonlcally  goes  down.  The  second  implication,  which 

* 

becomes  obvious  from  the  proof,  is  that  the  m-values  for  the  nth  order 
model  can  be  obtained  from  those  of  the  (n+l)**^  order  model  by  summing 
up  values  that  are  2n  apart.  Thus,  once  the  data  has  been  summarized  in 
a tabular  form  for  high  enough  n,  all  lower  order  models  can  be  easily 
calculated.  An  example  is  shown  in  Table  4.4.  Here  m^  f0r  the  2nd 
order  model  is  obtained  by  adding  m21  .„d  m6l  (6  * 2+2n,  n-2)  of  the  3rd 
order  model  and  so  on.  Thus,  starting  from  a large  n,  one  can  calculate 


V 
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the  total  cost  TC(n)  for  that  and  all  lower  order  models.  A plot  of 
TC(n)  vs  n will  look  similar  to  that  shown  in  Figure  4.2a. 

In  choosing  the  model  order,  a compromise  must  be  made  between  the 
amount  of  computation  required  for  the  model  and  the  improvement 
obtainable  by  the  model.  The  complexity  of  the  model  is  exponential,  t 
i.e.,  0(2n).  Hence,  the  net  utility  of  an  nth  order  model  is  TC(n)-a2n, 
where  a is  some  normalizing  constant.  The  optimal  order  is  obviously 
the  one  that  maximizes  this  utility  (see  Figure  4.2).  Another  fact  that 
should  be  pointed  out  in  this  regard  is  that  as  the  model  order 
increases,  the  number  of  parameters  to  be  estimated  increases  and,  j, 

hence,  the  precision  (or  confidence)  of  parameter  values  may  go  down. 

As  at  present,  we  do  not  have  formulae  for  parameter  confidence. 
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Figure  4.2-  Determination  of  mode!  order  n. 
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lull  fltt  siMimmx 

A stochastic  process  is  called  stationary  if  its  probabilistic 
behavior  is  independent  of  the  time  origin.  In  our  Boolean  model  we 

assumed  PC*(t) , h(Ztn)  to  be  independent  of  time.  For  a 
stationary  process  this  is  obviously  a valid  assumption.  For  a general 
non-st at  ionary  process  the  model  should  be 

r 22s1 

PLz(t ) { ztn]  « h(t,Ztn)  « hi(t)qi(ZtB) 

isO 

i.e.,  the  model  parameters  are  functions  of  time.  We  do  not  know  how  to 
estimate  these  time  varying  parameters.  Nor  do  we  have  any  tests  for 
statlonarity  (simi\ar  to  the  ACF  going  to  zero  for  continuous 
processes).  However,  what  we  do  know  is  that  the  time-independent 
Boolean  model  applies  also  to  the  so  called  "Homogeneous  non-st at  ionary" 
processes.  A non-st at  ionary  process  is  called  homogeneous  if  its  dth 
difference  is  stationary  for  some  d.  Recall  that  in  the  case  of 
continuous  processes  ARIMA  rather  than  ARMA  models  are  used  to  model 
such  homogeneous  processes.  The  following  theorem  proves  the  above 
statement . 

A . 

Theorem  : If  the  dth  difference  of  a k-ary  process  z(t)  follows  an  nth 
order  time-independent  Boolean  equation  then  the  process  Itself  follows 
a (d+n)th  order  time- independent  equation. 

Proof  s Consider  the  1st  difference  prooess  y(t) 

y(t)  - i.;t)  - z( t — i ) 

So  that  £ y(j)  . z(t)-z(0) 

>1 
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Hence,  P[z(t ) I Ztn+1 ] a P[z(t ) I z(t-n-1 ) , z(t-n) , . . . ,z(t-1 )] 

« P[z(0)+^  y(j)iz(0)+t^y(j),  zCO)^^  y(j), 

J*1  J*1  J«1 

*(o)+5i1y(j)] 

j.i 

* pty(t)iz(0)*  £y(j),  y(t-n) , y(t-n+1 ) , ....  y(t-1)] 

* P[y(t)iy(t-n),  y(t-n+1) , y(t-1 )] 

= Independent  of  time  if  y(t)  is  nth  order 
Thus  we  see  that  if  the  first  difference  y(t)  has  nlh  order 
time-independent  Boolean  model,  then  z(t)  has  (n+1)8t  order 
time- independent  model.  By  taking  successive  difference,  the  theorem 
for  d*h  difference  follows. 

[Q.E.D.] 

The  implication  of  this  theorem  is  that  we  can  use  the  theory 

-t  4 *•*  *•*  ’ *■  «'**'  | 

developed  so  far  for  our  page  refergnce  process  whose  1at  difference  was 

"■  \ i ’ i': > . £ x ' 

shown  to  be  stationary  in  the  last  chapter. 

fV  - • 1 

1 


H 
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Ul  sm  BfirUfiSBBH  using  m boolean  model 


There  are  two  ways  of  using  the  Boolean  model  for  page 
replacement.  The  first  is  simply  to  use  the  model  to  predict  z(t)  from 
the  knowledge  of  z(t-1),  z(t-n).  In  this  case,  one  must  choose  as 

the  modeling  Interval  T * R/U,  the  ratio  of  replacement  and  usage  costs. 

As  was  shown  in  the  last  chapter,  the  cost  criterion  in  this  case  is 
least -squares.  This  method  is  straightforward  and  we  do  not  develop  it 
any  further  here. 

An  alternative  method  arises  from  the  fact  that  with  the  Boolean 
model  we  are  not  restricted  to  the  least-squares  cost  criterion.  Hence, 
we  can  design  a page  replacement  policy  without  any  restriction  on  T. 

In  this  section  we  develop  such  a policy.  The  policy  is  a realizable 
version  of  a theoretically  optimal,  but  unrealizable,  policy  called  VMIN 
[PrF76].  In  order  to  see  the  optimality  of  VMIN,  consider  a particular 
page.  Without  loss  of  generality,  we  can  assume  that  the  modeling 
interval  T is  unity.  Supposing  we  know  the  complete  page  reference 
process  (past  as  well  as  future),  let  s be  the  length  of  time  for  which 
the  page  is  not  referenced  following  t,  i.e.,  z(t),  z(t*1 ) , . . . , z(t+a-l) 
are  all  zero  and  z(t+s)  is  1. 

Let  d(t)  a Decision  to  remove  the  page  at  time  t. 

( 1 •>  page  is  removed 

« < 

\ 0 a>  page  is  kept  in  the  main  memory 
The  cost  of  the  decision  d(t)  over  the  Interval  (t,  t+s)  is 
C • Rd(t ) ♦ sUd(t)  * sU  ♦ (R-sU)d(t) 


I 
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Since  the  cost  is  linear  in  d(t),  the  optimal  decision  is  d(t)*l  iff 
R-sU  < 0,  i.e.,  d(t )«1  iff  s > 5 

Thus  the  optimal  policy  is  to  keep  the  page  if  it  is  going  to  be 
referenced  in  the  next  R/U  interval.  This  is  the  VMIII  policy.  However, 
this  is  unrealizable  becauae  it  uses  both  past  ar J future  information. 
A realizable  version  that  uses  only  past  information  can  be  derived  as 
follows . 


Since  the  future  is  unknown,  the  "forward  recurrence  time"  s is  a 
random  variable  and  the  expected  cost  E[C]  » Rd(t)  ♦ u3(t)E[s]  is  to  be 
minimized.  The  optimal  decision  based  on  the  past  information  is, 
therefore,  to  choose  d(t)«1  iff  E[s]  is  greater  than  R/U.  The 
distribution  of  s,  and  hence  its  expected  value  can  be  derived  from  the 
Boolean  model  of  the  process  as  described  by  the  following  two  theorems. 


4.11.1  Theorem*  : The  "reverse  cumulative  distribution"  ( 1-cumulative 
distribution)  of  s is  given  by 

riu  * P[s>u|Ztn*i] 

* ft  E„« 


h- j . 
J*0  2Jl 


• In  this  chapter  we  use  the  convention  that  whenever  i appears  in  a 
subscript,  the  expression  upto  i is  evaluated  modulo  2".  Anything 
following  i is  simply  another  subscript.  Thus  riu  ia  »r  8Ub  i .^onaa  l", 
whereas,  h^  ia  "h  aUb  (3  times  1 modulo  2n)".  For  example,  for  r.»2, 
i*3,  h3i»h9«h1. 
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Proof>  5 rio  * P[3>0|Ztn»l] 

« PCt(t).0|Ztnal] 

*hl 

rlu  * P[s>uiZtn«i] 

s P[*(t+U)*0,  z(t+U-1 )*0,  *(t)«0!ZtnSi] 

* PC*(^u)«0|Ztn,itZ(l)a0f...,2(t-u-1)«0] 

P[z(t-U-1)*0,...fz(t)a0|Ztnal] 

*vri>u-' 

The  above  equation  gives  a recursive  expression  for  riu.  gy  applying 
the  recursion  for  successively  decreasing  value  of  uf  and  using  the 


initial  condition  r^  we  get  the  theorem  as  stated. 


[Q.B.D.] 


Cor°u*ry  ! »ia  . «..ulz,n.l]  . hjUl 
: flu  ■ - pl,u  • j'frfijj,  - * Vl 


ro.E.D.] 


4.11.2  Theorem  : Let  s^  m g[«|Ztn*i].  Then,  s^  is  given  by  the 
following  recursive  equation: 

**1  « fft( l+Sji ) i*0,1,...,2n-1 


and  a. 
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ffu(ri,u-1  - piu> 

UsO 

* 1^r10“pil)  ♦ 2(rji-ri2)  ♦ 3(ri2-ri3)  «•  ... 

* ri0  ♦ Pi1  ♦ pi2  * ••• 

’ Bi  ♦ BiR2i  ♦ RiR2ifi4i  ♦ ••• 

* ( 1 ♦ B2^  ♦ B2iB4l  ♦ • • • ) 

* Ri  O ♦ a ) 

21 

By  substituting  1«0,  In  the  above  equation  we  get 
S0  * B0  <u**0) 


°r 

Alternatively,  one  can  derive  an  expression  for  sQ  from 


uP[s*u!Ztn«0].  The  result  is  the  same  as  above. 
u*0 


[Q.E.D.] 


Using  theorem  4.11.2  one  can  get  an  expression  for  all  s^, 
i»0, 1,2, ...,2n-1  in  terms  of  h^.  However,  one  must  follow  a particular 
order.  After  calculating  »i  for  i«v,  calculate  Sj^  for  i*v/2  and 
2n-14-v/2.  For  example,  for  n*2  the  following  expressions  are  obtained. 

ho  moo 

80  * h”  * i!' 
n0  m01 

n2  n2o  “OO^OI 

S2  * n?(  U*4>  * B2<1+S0)  m ~ , 

h0  b20+b21  01 


* 
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h2  mio  «20^m00‘Hn01^ 

hi(U32)  = Muh“)  * ( i +;““7 — 

"O  m10'WB11  m0l(”20  ®2l) 

h2  B130  m20^m00+“01^ 

^3(1+35)  s *13(1+82)  * n3<  1"*v”)  * ( 1+"  ") 

h0  “BO^I  m01  (|B20'wb21  ) 

Thus,  using  the<estimated  value  of  the  forward  recurrence  time  for 


the  current  state,  one  can  decide  whether  to  replace  a page  or  not. 
This  version  of  the  "VMIN"  algorithm,  although  realizable,  is  too 
expensive  to  implement  in  practice.  This  is  because  for  an  n*h  order 
model  2n'f1  registers  are  required  to  hold  m-values.  This  number  is 


prohibitively  large.  Even  for  ne2,  eight  registers  are  required  for 
each  page.  To  manage  a page  of  1048  words,  using  eight  registers  is  not 
economical.  Therefore,  at  the  present  time  we  do  not  discuss 


implementation  aspects  of  the  algorithms  developed  in  this  chapter. 
However,  with  rapidly  advancing  memory  technology  it  is  quite  possible 
that  pages  of  future  will  be  much  bigger  and  eight  registers  per  page 
would  then  not  be  an  expensive  proposition.  If  that  happens,  it  might 
be  interesting  to  do  empirical  analysis  of  page  reference  process  and 


develop  a Boolean  model  of  it. 
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In  this  chapter  we  have  proposed  a direct  approach  to  modeling, 
estimation,  and  prediction  of  a k-ary  process.  The  process  is  modeled 
as  the  output  of  a Boolean  system  driven  by  a set  of  k-ary  white  noises. 
The  model  makes  use  of  the  special  properties  of  pseudo-Boolean 
functions  in  sum  of  products  form.  An  expression  for  the  likelihood 
function  has  been  developed.  Using  this  expression  a formula  for 
calculating  the  maximum  likelihood  estimate  of  the  model  parameters  has 
been  derived.  A method  of  finding  the  optimal  non-linear  predictor  has 
been  developed.  The  method  makes  use  of  the  sample  frequency 
distributions  of  the  fundamental  products. 

Two  different  ways  of  designing  a memory  management  policies  based 
on  the  Boolean  model  have  been  presented.  The  algorithms,  although 
physlcaly  realizable,  are  not  economical  enough  for  practical 
implementation  at  the  current  state  of  technology.  However,  the 
research  reported  here  is  valuable  from  the  control-theoretic  point  of 
view  for  application  to  other  systems. 

In  the  case  of  Gaussian  variables,  the  joint  probability 
distribution  of  n variables  is  completely  specified  by  specifying  the 
mean  and  covariance  matrix.  Therefore,  while  analyzing  Gaussian 
processes,  we  summarize  the  data  in  terms  of  the  autocorrelation 
function.  In  the  case  of  binary  variables,  we  find  that  autocorrelation 
has  no  importance.  Instead,  the  role  is  played  by  nth  order  moments  - 
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the  expected  values  of  fundamental  product  terms.  Thus,  whereas  the 
sufficient  statistic*  of  a sample  of  n Gaussian  variables  has  n(n  ♦ 1)/2 
terms  (n  means,  n(n-1)/2  variances),  that  of  n binary  variables  has  2T*-^ 
(kn-l  in  the  k-ary  case)  terms. 

We  have  partly  resolved  the  question  of  "representation”.  In  the 
case  of  Gaussian  processes,  the  representation  theorem  [Ast70]  states 
that  every  stationary  stochastic  process  with  rational  spectral  density 
can  be  represented  as  the  output  of  a linear  system  driven  by  white 
noise.  In  this  chapter,  we  have  shown  that  any  stationary  finite  order 
Markov  process  can  be  represented  as  the  output  of  a Boolean  system 
driven  by  a set  of  k-ary  noises. 


i 

l 

l 


The  Boolean  approach  to  the  analysis  of  k-ary  process  parallels  to 
that  conventionally  used  for  Gaussian  processes.  However,  as  this  is 
the  first  time  that  this  approach  has  been  taken,  many  issues  remain  to 
be  resolved.  In  particular,  the  problem  of  order  determination  needs 
further  research.  Nevertheless,  some  of  the  results  obtained  are  more 
general  than  those  known  for  Gaussian  processes.  For  example,  our  model 
gives  the  optimal  non-linear  predictor  for  any  given  linear  or 
non-linear  cost  function,  whereas  most  of  the  literature  on  Gaussian 
processes  deals  with  the  optimal  linear  predictor  for  the  least-squares 
cost  function. 


* The  sufficient  statistic  is  the  minimal  set  of  statistical  summaries 
that  contains  all  the  useful  information  in  the  sample  data. 


I 
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U.  mmi  QL  BfiSUlIS 

Most  resource  management  problems  are  basically  prediction 
problems.  Therefore,  we  advocate  the  use  of  modern  stochastic  control 
theory  to  formulate  operating  systems  resource  management  policies.  In 
this  thesis,  we  have  proposed  a general  approach  to  the  prediction  of 
resource  demands  of  a program  based  on  its  past  behavior. 

We  exemplified  the  approach  by  applying  it  to  the  problems  of  CPU 
management  and  memory  management.  One  interesting  outcome  of  the 
research  reported  here  is  that  our  control-theoretic  approach  also 
provides  an  explanation  for  many  previously  described  policies  that  are 
based  on  completely  non-control-theoretic  principles. 

In  the  case  of  CPU  management , it  was  shown  that  the  successive 
CPU  demands  of  a program  constitute  a stationary  white  noise  process. 
Therefore,  the  best  predictor  for  the  future  demand  is  the  current  mean 
value.  Several  different  schemes  for  adaptively  predicting  the  demand 
were  proposed.  An  adaptive  scheduling  algorithm  called  SPRPT  was 
described.  It  turns  out  that  Dijkstra's  "T.H.E."  operating  system  uses 
a scheme  similar  to  one  of  the  proposed  ones.  Thus,  we  also  have  a 
control-theoretic  derivation  and  explanation  of  T.H.E.'s  CPU  management 
policy. 

In  the  case  of  memory  management , we  started  with  a very  simple 
stochastic  process  model  and  still  obtained  significant  results.  We 
showed  that  the  cost  of  memory  management  is  proportional  to  the  square 
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of  the  prediction  error.  Empirical  analysis  showed  that  the  process  is 
non-st at  ionary  and  that  an  ARIMA( 1,1,1)  model  is  appropriate.  A new 
page  replacement  algorithm  called  "ARIMA"  was,  therefore,  proposed.  The 
algorithm  is  not  only  easy  to  Implement,  it  also  unifies  many  other 
algorithms  previously  cited  in  the  literature.  In  particular,  Working 
Set  and  the  Independent  Reference  Models  were  shown  to  be  boundary  cases 
of  the  algorithm  proposed  in  this  thesi:'. 

The  memory  management  process  is  a binary  process.  The  absence  of 
suitable  techniques  for  prediction  of  such  processes  led  us  to  develop 
new  techniques  for  modeling,  estimation  and  prediction  of  binary 
processes.  We  later  extended  these  techniques  to  k-ary  processes  also. 

Our  approach  was  to  model  these  processes  as  the  output  of  a Boolean 
system.  This  "Boolean  approach"  allowed  us  to  find  the  optimal 
non-linear  predictor  for  the  process  under  any  given  non-linear  cost 
function.  The  model  was  shown  to  be  applicable  to  a subclass  of 
non-st at  ionary  processes  also.  However,  when  applied  to  the  memory 
management  problem,  the  resulting  algorithm,  though  optimal,  is  rather 
; expensive  to  implement  for  currently  used  page  sizes.  Nevertheless,  the 

research  reported  here  is  Important  from  the  control-theoretic  viewpoint 
for  apDlication  to  other  systems. 

! 
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There  are  many  avenues  along  which  the  research  reported  in  this 
thesis  can  be  extended.  The  first  possibility  is' to  investigate  the 
problem  of  Joint  management  of  CPU  and  memory.  In  this  thesis,  CPU  and 
memory  demands  have  been  modeled  as  independent  processes.  Strictly 
speaking  this  is  not  true;  the  CPU  demand  is  affected  by  the  memory 
policy.  For  example,  a bad  memory  policy  may  result  in  frequent  page 
faults  causing  tasks  to  be  descheduled  prematurely. 


As  far  as  the  analysis  of  binary  or  k-ary  processes  is  concerned, 
there  are  many  issues  that  need  to  be  resolved.  In  particular,  the 
problem  of  order  determination  needs  further  research.  Tests  for 
stationarity  and  models  for  non-stat ionary  k-ary  processes  should  be 
developed.  The  possibility  of  using  less  expensive,  though  suboptimal, 
predictors  should  be  investigated.  This  is  particularly  desirable  for 
application  to  memory  management. 

The  control-theoretic  approach  can  be  extended  to  the  management 
of  other  resources,  e.g.,  disks.  The  disk  scheduling  policy  can  be 
optimized  if  the  disk  demand  behavior  of  programs  is  predicted  in 
advance . 

The  approach  can  also  be  used  for  the  modeling  of  other  systems. 
For  example,  in  a database,  the  record  access  patterns  can  be  modeled  as 
a stochastic  process  and  Us  prediction  used  to  determine  the  optimal 
organization  and,  hence,  the  reorganization  points  of  the  database.  In 


* 
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the  case  of  computer  networks,  the  arrival  patterns  of  packets  at  a node 
can  be  modeled  as  a binary  stochastic  process.  The  forecast  of  future 
packet  arrivals  can  then  be  used  for  flow  control  or  to  avoid  congestion 
in  the  network. 

The  essence  of  our  philosophy  in  this  thesis  is  that 
control -theorist a have  made  good  use  of  computers  to  develop  better  and 
faster  modeling,  estimation  and  prediction  techniques.  It  is  now  time 
for  computer  scientists  to  use  these  techniques  to  develop  better  and 
faster  computer  systems. 
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A stochastic  process  is  a sequence  of  random  variables,  say,  2(1), 
z(2) , . . . ,z(t ) , . . . . The  simplest  stochastic  process  is  white  noise 
e(t).  It  has  the  property  that  any  two  elements  e(t)  and  e(i+k)  of  the 
process  are  not  correlated.  The  process  in  which  only  consecutive 
elements,  i.e.,  z(t)  and  z(t+1),  t*1,2,...  are  correlated  is 
represented  by  a moving  average  model  of  order  1 ( MA(1)  ): 

z(t)  a w + e(t)  - b1e(t.1) 

Here  the  expression  "e(t)  - b^e(t-1)n  represents  a moving  average  of  the 
white  noise  process  e(t),  and  w is  a constant  used  to  balance  the  mean 
on  the  two  sides  of  the  equation.  A moving  average  process  of  order  q 
( MA(q)  ) is  similarly  represented  by 

z(t)  a w ♦ e(t ) - b1#(t-1)  - b2e(t-2)  - ...  -bqe(t-q) 

On  the  other  hand  the  process  represented  by 

z(t)  - alZ(t-l)  - a2z(t-1)  - ...  -apz(t-p)  = w + e(t) 
is  called  an  autoregressive  process  of  order  p ( AR(p)  ).  The  name 
clearly  indicates  that  the  process  z(t)  depends  (regresses)  on  its  p 
past  values.  A process  which  has  both  AR  and  MA  parts  is  called  an 
"ARMA"  process.  The  ARMA(p,q)  model  is  given  by  the  following  equation: 

z(t)-alZ(t.i)...._ap2(t_p)  a w+e(t )-b1e(t-1 )-.. .-bqe(t-q) 

For  a process  z(t)  its  dth  difference  is  defined  as  follows: 

Dz(t)  s z(t)  - z(t-1) 

D2z(t)  a Dz(t ) - Dz(t-I) 


Ddz(t)  a Dd-'ZU)  - Dd-lz(t-l) 
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Now  if  y(t)  = Ddz(t)  is  the  dth  difference  process  of  z(t),  then  z(t)  is 
called  the  d**1  integrated  process  of  y(t).  Thus  if  y(t)  is  shown  to  be 
an  ARMA(p,q)  process,  z(t)  is  said  to  be  an  autoregressive  integrated 
moving  average  process  of  order  p,d,q,  i.e.,  ARIMA(p,d,q) . Using  the 
backward  shift  operator  B,  Bz(t)  = z(t-1),  the  ARIMA(p,d,q)  model  can  be 
written  as 

(1-alB-...-apBpKl-B)d*(t)  * w ♦ -bqBq)e<t) 

Further  details  on  ARIMA  models  are  given  in  [BoJTO]. 
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B.1  Proof  of  Theorem  2.3.1:  Let  us  assume  that  the  tasks  T_  t . 

- - 0*  •••*  *n-l 

have  been  so  numbered  that 

*0  * * •••  ^ ^n-1 

This  assumption  of  numbering  does  not  cause  any  loss  of 
generality.  If  the  tasks  are  not  in  the  required  order,  we  sort  them  in 
the  required  order.  Let  k'  denote  index  of  the  k task  in  the  sorted 
sequence,  then,  tQ, ,t 1 , , . .. ,tn_i t form  the  required  ordered  sequence. 
The  rest  of  the  proof  can  now  be  carried  our  with  non-primed  subscripts. 

Also  notice  that  we  assume  an  increasing  sequence  rather  than  a 
non-decreasing  one.  Thus  we  are  excluding  the  possibility  of  two  t^fg 
being  equal.  This  is  only  to  keep  the  proof  simple.  If  equality  is 
allowed,  the  optimal  sequence  is  no  longer  unique.  However,  the  MFT  is 
same  for  all  optimal  sequences,  and  hence  the  final  cost  expression 
remains  the  same. 


The  minimum  MPT  with  known  t^g  ig 

MFT  1 

WT0  * - L.  (n-i)ti 

j ss 


isO 


How,  if  due  to  predicion  error,  ith  task  T£  ia  placed  in  k^*1  position 
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then 

So  that  the  expected  increase  in  MFT  is 

c = E[MFTp]-MFT0 

s (i-i)ti  where  £ = £[kj] 

i=0 

It  only  remains  to  find  an  expression  for  Efk^. 

ao 

✓ 

Since,  ECk^  s { BCkiltisu]fi(u-)du 

(/ 

We  need  only  show  that 

E[kiifisu]  . tPj(u) 

J=0  J 

Jrfi 

The  easiest  way  to  show  this  for  any  n and  i is  by  the  method  of 
induction.  This  is  obviously  true  for  n=1.  Assuming  that  it  is  true 
for  r.,  we  now  show  that  it  is  true  for  n+1.  For  a set  of  n tasks,  let 

Pf^.vifj.ulmpv,,.  Now  if  a (n+1)at  task  TR  i8  added,  the  task  T*  will 
change  posit  ion , at  most , by  1 . Therefore , 

Pv,n+1  » PVn  PCfn>u]  ♦ Py-1  ,n  Pffn£ul 

* pvn^“Fn(u) } ♦ Pv-l,n  Fn<u)  v«0,1,...4n-1 

The  boundary  conditions  are 

P0,n-*.1  * Ponn-Fn(u)]  and  pn,n+1  = pn_1?n  Fn(u) 

Therefore,  the  expected  position  of  T^  among  n+1  tasks  is 
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^iltisu.n+l  tasks]  = vpvn+1 

VsO 


npn,n+1  * ^ vpynf 1-F(u)I  ♦ vPv-1,n  Fn(°) 

Vsl 


as.1 

* npn-1,n  Fn<u>  ♦ n-Fn(u)]4_  vpvn  + Fn(u)  Z_vpv.i  ,n 

V=  1 Vs  1 

* £ tasks]  ♦ Fn(u){1+E[k1!fi=u,n  tasks]} 

= Fn(u)  + Etkj^jfjsu.n  tasks] 

s ^ Fj(u) 

HI  [O.E.D] 


B.2  Proof  of  Theorem  2.3.2:  Again,  as  in  theorem  2.3.1  we  assume  that 

lk 

the  tasks  “j(..,tT_  ^ have  been  so  numbered  that  ~~  form  an  increasing 

wk 

sequence,  i.e., 


I 


Let  the  predicted  value  tp  be  such  that  the  optimal  position  of 
task  T according  to  SPT  is  after  task  i,  i.e., 


_i  tp  t 

u ^ w 
i wo  "i+1 


whereas  the  real  value  tQ  3UCh  that  the  optimal  position  of  the  task 
?0  would  be  j,  i.e., 
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to  t>i 

w.  < w ^w.  . 

J O j+1 

Hence  MfciFT  with  Tq  after  task  is  given  by; 

MHFTP  * w«fp  ♦ wijfk) 
k*1 

where  f^  . finishing  time  of  task  k with  TQ  after  T* 
k 


msl 


/ L tm,  likii 

♦ ti  tm,  i+H  *1  n-1 


m=1 


aM  fp  * to*  £ t„ 
msl 

Notice  that  we  use  tQ  (and  not  tp)  in  the  above  expression  for  fk.  This 
is  because  the  prediction  error  results  only  in  misplacement  of  the 
task.  When  executed  it  still  takes  only  tQ. 

Similarly,  MWFT  with  TQ  after  task  Tj  is  given  by: 

MWFTo  = "(w0f0+  Yi  Wkf,k} 
ksl 

where  f'k  - finishing  time  of  task  Tk  with  T0  after  Tj. 

fa'*’ 

* <, 

'V  £ tn,  >1ikin-1 
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and  f 


* to+  tm 
m:1 


The  increase  in  MWFT  due  to  prediction  error  is  given  by  : 
c = MWFTp_MWFTo 

1 a=.1 

s n[wo(fp-fo>  ♦ L.  vkCfit-f’k)] 

ksl 

Now  there  are  two  possible  cases:  i>j  or  j>i. 


\ 


Case  I : i>J  The  predicted  position  is  higher  than  the  real  position. 
In  this  case, 

/ 0 likU 
I 


fk-f'k  = <j  -t0  >1ik<i 

v 0 i+likin-1 


V 


4 


and  f t 

P xo  - tk 

ksj+1 


Therefore, 


3I  = ^wo  t.  tk  ♦ 0 ♦ t.  -t0wie  ♦ 

krj+1  ksj+1 

= n[wo  tl  tk  ♦ tl  -to«k  ] 
ksj+l  ksj+l 

n (wotk*towk^ 
ksj+i 

1 v 

8 " k*l  lw°tk’toWl{l 


0] 


where  Is(j+l,i] 
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Case  II  : J>i  The  predicted  position  is  less 

In  this  case, 

likli 

i+1ikij 
J+likin-l 

k ♦ <0  £«k  1 

k«i+1 

1 v^- 

S " l_  (-Wotjc+toW^) 

n ksi+1 

n 'wotk-towk' 
n kel 

where  I*[i+1,J] 

>>•0  . •’  ' •»  * 

The  two  cases  can  now  be  combined  together  by  redefining  I as 
follows : 

0 ' "•o'*-'"’'*1 

*k 

where  I * {k  : — 
wk 


Vf’k  = \ t0 


and  f r , _ 

p lo  3 

Therefore, 


ksi+1 


°II  3 -w0  £ t 

ksi-t-1 


r-,T  ^ v; 
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and  J = < 


: 1p  to 

i (w  * w ) if  to>tp 

' O Q 


[Q.E.D.] 


B.  3 Proof  of 


2.3.2. 1 ; The  corollary  follows  at raght forwardly 


from  theorem  2.3.2  by  substituting  w0-Wpa1>  (It  can  also  be  obtained 
from  theorem  2.3.1  by  substituting  impulse  functions  for  probability 


density  functions  of 


[Q.E.D] 


B.M  Proof  of  Corollary  2. 3.2.2  : Substituting  w aW^*i  in  ^e 


expression  for  c^  j,n  the  above  proof  we  get 
°I  = n ^k”t0) 

" jjt  t ic  -t0d-j)] 

ksj+1 


* “[  £ kT  -t0(i-J)] 

lCa  J+1 

1 i(i+1)  1(U1) 

, -[  T(  — *-* ) - tQ(i-J)] 


* 2nTC  ( J )t2  " “qU-JW 
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I 


Since  iT  < tp  < (i+1)T,  tp  5 (i^)T 
Similarly,  tQ  = (>1)T 
Therefore, 

(tp”t0)2  5 C(i-j)T]2 

= (i2+j2-2ij)T2 
= (i2+i-j2-j+2j2-i+j-2ij)T2 

= (i2+i.j2.j)T2  _ 2( 

= 2nTCj 

i • C • y 

GI  = inT^P"to^2 
Similarly  for  Cjj, 


I 


[Q.E.D.] 

I 


/ 


T 


Appendix  C 

Proofs  of  k-ary  Process  Theorems 


Page  C-1 

xlbnsoqA 


I ! I i 

i ft. 


£09*109 A i 01 ' 

AEgBMPIX  £ 

PRQQES  Qg  THEOREMS  QN  K-ARY  PROCESSES 

.•  • i a •>•’.  J ?£;i  i , a»B4i930‘iq  yisnid  '! *>&;;.  art ; n.;  at  n;  as*. 

Proof  of  Estimation  Theorem  : The  proof  is  essentially  similar  to  the 
binary  oase  except  that  now  we  have  to  maximize  the  likelihood  function 
under  kn  constraints: 

i.0,1 kn-1 


r 


L.  hiu  « 1 
UaO 


jz2 


ti 

hiufli(ztn)  u«0,1,...,k-1 
iaO 

i aVK:i  j tiox  Jonus  snl  Stuau  ,v-r:». 

The  above  k equations  can  be  combined  into  one  as  follows  : 

PC*(t)tZtn,H]  . PEz<t).OiZtn,H]z°(t)  P[*(t)-1|Ztn,H]z1(t>  ... 


...  P[z(t)«k-1|Ztn,H]z 


k-1 


(t) 


■ ft’*/'1’ 

UaO  lU 


=V 


s 


where  zu(t)  is  the  uth  Lagrangean  function  of  z(t). 

• ••  ’ ' - 3 ;.a  •' 

The , likelihood  function  for  N observations  is  given  by 

L(H)  a P[z(N) ,z(N-1 ) , . . . ,z( 1 ) ! z(-n+1) , . . . ,z(0) ,H] 

( JsoXy  ,-z  2 

a P[z(N)|z(H-1) *(1).Z1ntH]P[z(M-1)!z(N-2),...,z(1),Z1n,H]... 

...P[z(1)iZ,  u] 

1n’  ,<  "•>  u;‘  • 


■it 


- } 


tai  pfz^^'ztn»H3 

C S).p(l)Us  )art)  Jot.'l  art  r 1c  Jlueai  s ni  \iiisup»  i uni  <>ilT 

ft  * I 


Appendix  C 

Proofs  of  k-ary  Process  Theorems 


JL  ks.1 

• ll  ll  Ptu 

t*1  u«0 


2U(t) 


Again  as  In  the  case  of  binary  processes,  we  assume  that  the  initial 
conditions  z(-n+ 1 ) , . . . ,z(0)  are  given  (or  are  assumed  equal  to  zero), 
and  only  the  parameters  are  to  be  estimated.  The  log  likelihood 
function  is 
1(H)  * log{L(H) } 

* £ 
t*1  u«0 

Now,  using  the  Function  Lemma  we  have, 
n 


(t)log  p 


tu 


log  p 


-i 


tu  “ lo8^hiu^  ^i^tn^ 
i*0 


Hence , 

1(H)  * £ £?iU<t)  f log(hiu)  qi(Ztn) 


t» 1 u*0 


■ft?  log(hlu)  £zU(t)qi(Ztn) 
i*0  usO  .>1 


■ft? 


i»0  u*0 


m^ylog(h^y) 


where,  mlu  « £ z^Oq^Z^) 


t»1 


» # of  times  Ztn«i  is  followed  by  z(t)*u 


The  last  equality  is  a result  of  the  fact  that  the  quantity  zU(t)q  (z.  ) 
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is  1 if  and  only  if  z(t)*u,  and  Ztn,i.  The  maximum  likelihood  estimates 
of  the  parameters  are  obtained  by  maximizing  1(H)  under  the  constraints 


¥ 

L.  hiU  * 1 


1*0,1 , . . . ,kn*1 


We  use  the  method  of  Lagrange  multipliers  for  constrained  maximization. 
The  modified  objective  function  is 

1(H)  * X ^Bi“l08(hiu)  + ^ wl{1  " 5-*lu  > 

i*0  u*0  1*0  u*0 

t1  k^s1  k-1 

wi  ♦ L L®iul08(hiu>-Wlhiu 

1*0  i*0  UsO 

Here  are  Lagarange  multipliers.  The  neoessary  conditions  for 
maximization  are 

dl ' miU 

dh  * h - wi  * 0»  i*0,1,...,kn~1i  u»0,1,...,k-1 

aniu  niu 


hi„  ■ 1 i*0,1,...,kn-1 


ill 

The  first  equation  above  implies  that  h^  , The  second  equation 


Implies  that 


.1  ¥ 

w miU  * 1 or  wi  ■ L.  n 


i u*0  u*0 

Therefore,  the  desired  MLE  of  the  parameters  is 
®iu 


i*0,1,...,kn*1*  u*0, 1 , . . . ,k-1 


[Q.E.D.] 


16-^ 
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t;woonQ‘ 


;*v.V  f.  ft  'e  *:  a n.  : : kst  o.iT  # ? a brs-  , i - C * i li  V 

fCSOC  fil  PrtdlaUgQ  Theorem  : Let  the  desired  estimate  be 

»<t>  . 

where  g is  a Boolean  function  of  Ztn> 

using  the  Lagrangean  development  of  g we  have, 

. *r. : ••  si*  ■ jiM.1  an  ;enoo  no'!  ss'ieilqUIuc 

tl 

a .m.tl.  \ 


tjfij  io  boii Jsirr  *>.ij  aV 


i»0 


•i4i<ztn> 


"^9  f CO  09  ♦ 


where  *1  la  a ic_ary  variable  given  by  . g(Ztn|Ztn*i) . 

Oti  O'.'l  C=u  0 v » 

Applying  the  Function  Lemma  to  the  model  and  the  predictor  equation  we 

have 


r-  > 


*“U>  ■ kt’.i«)0l(2tn) 

iaO 

*V(U  ■^,I,1(ztn) 


.fc-*!aiiql  Uuw  6? 


if-  not JESif 


and,  therefore, 

. y1  ^ 


C(xU),Z(t» 


Z_  Z_ouv»v<»*u<‘> 
VaO  UaO 

,n-1 


c;3 : Iqsi 


L fe1  !&1euv*M(t)<,i(Ztn) 

iaO  VaO  UaO 


V Cl 


‘ps  tt-:  ar 
}snt  &9il qmf 


The  cost  function  is  linear  in  IJ,  Obviously,  8^  should  be  so  chosen 
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that  the  e^  that  has  the  smallest  coefficient  is  one  (all  other  e^*s 
will  then  automatically  become  zero),  i.e., 

*i  * arg  Bjn^euyhlu 
usO 

Now,  if  hiu  t8  determined  according  to  the  Estimation  Theorem,  then  hlu 
is  proportional  to  miu>  Hence,  the  above  formula  for  ej  is  equivalent 
to  that  stated  in  the  theorem. 

[Q.E.D.] 


Pr°°f  2£  Corollary  ‘‘•B.a.l  : In  this  can.,  euy  , , lff  u<v.  Henc«, 

L.  euvmlu  * -mlv  ♦ Btu 
UaO  UsO 

Hence,  the  v that  maximizes  also  minimizes  the  left  hand  side  of  the 
above  equation  and  hence  the  cost  function. 

[Q.E.D.] 


Proof  of  Total  Coat  Theorem  : 

TC  * fc(z(t),«(t)) 


t a 1 


IFF 

tel  VeO  UaO 

n-1 


tel  ieO  VeO  UeO 
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L 


r 


1*0 

where  TC.,  * ^*  *^  a sYm.. 

1 L-  1—  °uv*l“iu 
v*0  u*0 


v*0  u*0 


ain^T  c m 
v ~ uv  lu 
u*0 


The  last  equality  la  valid  because  l#  chosen  aooording  to  the 


Prediction  Theorem. 


Hence,  TC 


1*0  u*0 


euv®iu 


[Q.E.D.] 


Proof  at  Corollary  4 . S . 5 . * : This  oorollary  follows  straightforwardly 


from  the  total  oost  theorem  by  substituting  ouv*1  iff  urfv. 


[Q.E.O.] 
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Abstract 

This  report  discusses  the  use  of  satellite  computers  in  system 
software  debugging.  In  this  approach  a satellite  computer 
keeps  track  of  the  system  states  and  provides  useful  information 
if  the  observed  system  crashes  due  to  a software  bug.  As  a 
byproduct  of  this  approach,  the  data  gathered  can  also  be  used 
to  measure  and  monitor  the  system  performance.  This  approach 
has  therefore  been  named  "SODEM"  (Satellite  Observer  DEbugger 
and  Monitor)  approach. 

The  design  considerations  and  structural  modules  of  SODEMs 
have  been  discussed  in  detail.  The  ideas  have  been  illustrated 
with  examples  from  an  actual  implementation  of  a SODEM  to 
observe  a DECsystem-10  using  a PDP-11  processor. 
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I.  Introduction 

The  high  cost  of  software  development  and  maintenance  has 
raised  a demand  for  new  tools  for  software  debugging.  In  the 
case  of  application  software,  this  demand  has  resulted  in  at 
least  two  kinds  of  new  developments.  Firstly,  in  the  development 
of  many  interpretive  languages  e.g.,  BASIC  and  ELI  which  can 
be  easily  debugged,  and  secondly  in  the  development  of  special 
debugging  software  for  non-interpretive  languages  e.g.,  FORDDT 
for  FORTRAN. 

In  the  case  of  system  software  such  as  an  operating  system  the 
problem  of  debugging  is  complicated  by  the  necessity  for  it  to 
be  done  on-line,  i.e.  the  system  must  be  debugged  while  several 
other  users  are  using  it.  Of  course,  this  does  not  apply  during 
the  initial  development  phase  at  the  vendor's  plant  where  it 
is  easy  to  have  a machine  for  each  software  development.  The 
scenery  that  we  have  in  mind  is  a more  usual  one:  the  one  at 
a customer's  site  where  he  has  only  one  copy  of  the  hardware 
and  a relatively  bug  free  system  software  which  he  has  to 
frequently  modify  to  satisfy  the  changing  user  demands.  The 
users  are  allowed  to  run  on  the  modified  software.  Normally 
the  system  runs  fine,  except  that  once  in  a while  a bug 

f 

hidden  somewhere  brings  the  system  down.  To  locate  this  bug 
what  we  need  is  an  independent  observer  which  keeps  track  of 
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the  system  states  during  its  use  and  provides  helpful 
information  for  post  crash  analysis. 

One  approach  to  provide  this  independent  observer  is  to  use 
a virtual  machine  monitor  to  provide  several  copies  of  the 
hardware  (GaG72) . The  system  software  is  run  on  one  copy  and 
the  observer  on  another  so  that  the  latter  remains  alive  when 
the  former  crashes.  This  idea,  though  theoretically  sound,  may 
not  work  so  well  in  practice.  This  is  because  hardware  mal- 
functions, which  are  more  often  then  not  the  causes  of  system 
failure,  will  bring  all  the  virtual  machines  down  simultaneously. 
Strictly  speaking  this  should  not  be  taken  as  the  demerit  of 
this  approach  because  the  proposed  observer  is  not  supposed 
to  detect  the  hardware  malfunctioning.  However,  this  approach 
does  make  it  difficult  to  pinpoint  the  cause  of  failure. 

Another  approach  to  providing  the  independent  observer  could  be 
to  put  it  on  a separate  small  satellite  computer  connected  to 
the  main  processor.  The  observer  can  then  run  ind^pendntly  of 
the  system  software  as  well  as  hardware,  and  regardless  of  hard- 
ware or  software  nature  of  failure  the  observer  provides  the 
expected  service. 

The  idea  of  satellite  computers  is  nothing  new.  In  the  past  they 
have  been  extensively  used  as  I/O  interfaces  better  known  as 
peripheral  processors.  For  example,  CDC-6600  uses  10  peripheral 


processors  around  its  main  processor  (McW76) . Satellite  Processors 
have  also  been  used  as  measurement  tools  to  monitor  the 


performance  of  computer  systems.  An  example  is  a HEMI  monitor 
implemented  on  one  of  the  peripheral  processor  of  the  CYBER 
system  (Svo76) . 


The  scheme  under  consideration  here  is  to  use  a satellite  pro- 
cessor to  keep  track  of  the  states  of  the  system  software  on 
the  main  processor.  The  primary  purpose  of  these  observations 
is  to  help  debug  the  system  software.  Under  normal  running 
periods  the  record  of  system  states  can  also  be  sued  to  measure 
and  monitor  the  system  performance.  Therefore,  we  suggest  the 
use  of  the  acronym  SODEM  (Satellite  Observer  DEbugger  and 
Monitor)  for  this  new  role  of  satellite  computers.  Although 
the  term  SODEM  refers  to  the  union  of  satellite  computer 
hardware,  its  software  and  its  interface  with  the  main  processor 
we  will  frequently  use  it  to  denote  the  software  alone. 


The  SODEM  computer  does  not  necessarily  have  to  be  a computer 
smaller  than  the  main  computer,  not  is  it  necessary  for  it  to 
be  dedicated  for  observing  purpose.  In  fact,  it  can  be  any 
other  computer  connected  to  the  observed  system  by  a hardware 
link.  The  existence  of  such  links  is  beocming  very  common 
these  days.  In  the  light  of  recent  rapid  growth  of  computer 
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networks,  and  tremendous  success  of  ARPANET*,  the  importance  of 

f , 

SODEM  approach  is  obvious. 


At  Harvard  University,  we  have  developed  an  experimental  SODEM. 
Our  Aiken  Computation  Laboratory  has  a TOPS-IO  monitor  on  a 
DEC system- 10  connected  to  ARPANET.  Also,  it  has  hardware  links 
connecting  it  to  a PDP-1,  a PDP-11/10,  a PDP-11/40,  a PDP-11/70, 
and  a GT-10  computer.  We  chose  the  PDP-11/10  as  the  satellite 
processor  to  implement  the  experimental  SODEM  called  SODEM-1011 
(SODEM-ten-eleven)  and  to  study  various  aspects  including  the 
feasibility  of  this  appraoch.  The  details  of  this  implementation 
and  the  knowledge  that  we  have  gained  from  this  are  the  topic 
of  this  report. 


* In  the  ARPANET  the  IMPS  keep  observing  the  neighboring  IMP'S. 
When  an  IMP  comes  down,  the  neighbors  not  only  inform  other 
IMPS  in  the  network  but  also  reload  the  IMP  if  possible.  In 
this  sense  IMPS  provide  observation  and  monitoring  function 
of  SODEMs. 
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II.  Design  Considerations 

The  three  main  components  involved  in  a SODEM  are  the  system 
to  be  observed,  the  link,  and  the  satellite  processor.  In 
the  following,  we  discuss  how  the  choice  of  each  of  these 
components  affects  the  design  and  capability  of  a SODEM.  The 
ideas  have  been  illustrated  with  our  own  experience  from  the 
design  of  SODEM-1011. 

2 . 1 Data  Selection  and  Data  Rate 

Data  selection  involves  determining  exactly  what  information 
should  be  observed  i.e.,  what  constitutes  the  system  state 
and  how  it  can  be  extracted  and  recorded.  Stirctly  speaking, 
the  system  state  is  the  state  of  all  system's  memories  (main 
memory,  secondary  memory,  etc.).  Continuous  collection  of  all 
this  information  is  neither  feasible  nor  desirable.  Therefore, 
one  has  to  identify  a small  subset  of  important  memory  location 
(such  as  control  registers,  operating  system  tables,  etc.) 
for  observation  and  recording.  Also  there  is  the  problem  of 
data  availability.  Some  information,  although  important,  may 
not  be  obtained. 

Closely  coupled  with  the  question  of  data  selection  is  the 
question  of  data  rate  i.e.,  how  often  should  the  information  be 
collected.  This  rate  could  be  different  for  different  subsets 
of  collected  data.  For  example,  CPU  status  bits  may  havs  to 

...  • • , f .•  r 

be  observed  every  instruction  cycle  whereas,  teletype  status 
could  be  observed  every  few  milliseconds  or  so. 
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In  case  of  SODEM-1011  we  have  chosen  to  observe  key  tables  that 
the  TOPS- 10  monitor  keeps  permanently  in  the  core.  This  does 
not  include  many  of  the  disk  management  tables  which  are  kept 
on  the  disk  or  the  job  data  areas  which  are  swapped  in  and  out 
with  the  job.  This  choice  is  quite  arbitrary  and  is  based  on 
the  belief  that  in  the  case  of  a crash  these  other'  tables  could 
be  very  reliably  obtained  form  the  secondary  storage.  Depending 
upon  its  variability,  each  monitor  table  has  been  assigned  a 
cycle  time  at  which  it  is  to  be  recorded. 

2 . 2 Communication  Link 

Ideally  the  link  connecting  the  SP  (Satellite  Processor)  to 
MP  (Main  Processor)  should  be  such  that  the  SP  can  access  the 
complete  main  memory  of  MP.  The  second  best  choice  is  to 
let  the  MP  have  privilege  to  read/write  SP  memory.  In  the 
latter  case,  SP  sends  an  interrupt  signal  to  MP  whenever  it  wants 
some  portion  of  its  memory  to  be  read. 

The  hardware  link  connecting  our  PDP-10  and  PDP-11,  called 
X-BUS,  was  designed  much  before  the  SODEM  project  began.  It 
provides  a master-slave  relationship  between  the  PDP-10  and 
the  PDP-11.  To  be  more  precise,  the  PDP-10  can  use  the  X-BUS 
to  read/write  the  PDP-11  memory,  to  start/stop/interrupt  the 
PDP-11  or  to  sense  states  of  PDP-11.  The  PDP-11  can  not  use 
it  in  the  same  way  to  control  the  PDP-10.  The  usual  way  for 
the  PDP-11  to  communicate  with  PDP-10  is  to  write  it  in  a 
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buffer  and  wait  till  PDP-10  reads  it.  This  is  a rather  undesir- 
able situation  because  it  not  only  makes  communication  slow  but 
also  the  observer  has  to  depend  upon  the  observed  to  get 
the  data. 

2.3  Satellite  Processor 

Generally  the  satellite  processor  is  much  smaller  and  slower 
than  the  main  processor  (for  obvious  economical  reasons) . 
However,  it  is  desireable  that  the  two  processor  have  the  same 

word  or  byte  size.  This  saves  a lot  of  overhead  in  data 

* 

transformation  and  manipulation. 

In  the  design  of  SODEM-1011  we  were  faced  with  completely 
uncompatible  word  sizes  - 36  bits  in  PDP-10  and  16  bits 
in  PDP-11.  It  means  that  4.5  bytes  (8  bits  each)  are  required 
in  PDP-11  to  store  one  PDP-10  word.  Thus,  we  had  to  write 
a complete  set  of  data  manipulation  routines  to  add,  subtract, 
multiply,  divide  these  odd  size  numbers.  We  did  save  some  time 
by  sacrificing  some  PDP-11  memory  and  using  5 bytes  for  each 
word  instead  of  4.5  bytes. 
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III.  Structure  of  a ,spDPl_gogtwar» 

The  basic  functions  of  a SODEM  software  are  the  following: 

1.  To  collect  data  at  HP. 

2.  To  transfer  the  data  to  SP. 

% 

3.  To  transform  it  in  to  a form  suitable  for  SP. 

4.  To  display  it  to  the  human  user  as  SP  debugging 
the  software. 

5.  To  record  the  data  in  a permanent  storage  for 
future  use. 

Thus,  it  is  obvious  that  a SODEM  software  should  have  at  least 
5 modules,  one  for  each  of  the  above  5 functions.  In  addition 
it  should  have  a module  where  it  keeps  information  on  when  and 
what  is  to  be  observed.  The  interaction  between  these  various 
modules  is  shown  in  Figure  3.1.  The  details  of  these  modules 
with  examples  from  SODEM- 1011  are  now  explained. 

3.1  Data  Definition  Module 

As  said  above,  this  module  contains  information  on  what  is 
to  be  observed,  and  when  it  is  to  be  observed.  In  case  of 
SODEM-1011,  this  consists  of  a list  of  monitor  tables  to 
be  observed.  Each  table  has  a name  assigned  to  it,  and 
consists  of  several  words  called  entries.  Each  entry  has 
several  bytes  - collection  of  variable  number  of  bits. 
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Figure  1.2 1 Date  Structure  Diagram  of  SOOEM-lOll  Data  Baee 
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The  relationship  between  table,  entry  and  byte  can  be  very 
precisely  displayed  by  a OBTG  type  data  structure  diagram  shown 
in  Figure  3.2.  The  user  can  ask  for  a complete  table  or  a 
particular  entry  in  the  table,  or  a particular  byte. 

The  data  definition  module  is  thus  a schema  for  the  data  to 
be  collected. 

* > • 

3.2  Data  Collection  Module 

The  function  of  this  module  is  to  collect  the  data  from  MP 
memory.  If  the  communication  link  between  MP  and  SP  is  such 
that  SP  can  directly  read  any  location  in  MP  memory  then  this 
module  would  not  be  required. 

In  SODEM-1011,  the  data  collection  module  resides  on  PDP-10, 
it  reads  in  the  data  from  PDP-10  memory.  Currently,  it  runs 
as  a user  program,  therefore  it  has  to  use  monitor  service 
calls  known  as  GETTAB  UUO  to  get  monitor  tables.  However, 
eventually  it  will  reside  in  a fixed  portion  of  core  and 
have  a privilege  to  read  all  of  the  core  directly. 

3.3  Communication  Module 

This  module  handles  the  communication  link  between  MP  and  SP. 
This  means  that  it  handles  interrupts  in  either  direction, 
and  checks  the  data  for  correct  transmission.  It  makes  use 
of  the  system  clock  to  do  periodic  observations. 
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In  SODEM-1011  this  module  resides  partly  in  PDP-10  and  partly 
in  PDP-11.  This  is  because  of  the  peculiarity  of  our  X-BUX 
which  does  not  allow  PDP-11  to  read/write  PDP-10  memory  or 
interrupt  it.  Therefore,  the  PDP-11  portion  of  the  communication 
module  writes  the  data  request  messages  in  fixed  locations  to 
be  read  and  serviced  later  by  PDP-10  portion. 

3.4  Data  Transformation  and  Manipulation  Module 

This  module  transforms  data  from  the  format  used  on  MP  to  that 
on  SP.  It  would  not  be  required  if  both  the  processors  had 
identical  word  sizes.  Unfortunately,  in  SODEM-1011  the  word 
sizes  are  not  only  different  they  are  highly  incompatible. 

As  said  before,  we  use  5 bytes  in  PDP-11  to  store  one  PDP-10 
word.  Thus  the  data  transformation  module  consists  of  routines 
to  interpret  these  5-byte  words  in  Ascii,  Sixbit,  Octal, 

Floating  point  formats.  Also  it  contains  routines  for  arithmetic 
operations  like  addition,  subtraction,  multiplication,  and 
division  on  these  odd  size  numbers. 

3.5  Command  Decoder 

This  module  provides  the  debugging  facility  to  the  human  pro- 
grammer. It  is  recommended  that  its  command  structure  be 
similar  to  that  of  other  debugging  programs  used  at  the 
installation.  For  example,  in  SODEM-1011  the  commands  are  very 
similar  to  DDT  used  for  debugging  assembly  language  programs. 
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In  SODEM-lOIl  this  module  resides  partly  in  PDP-10  and  partly 
in  PDP-11.  This  is  because  of  the  peculiarity  of  our  X-BUX 
which  does  not  allow  PDP-11  to  read/write  PDP-10  memory  or 
interrupt  it.  Therefore,  the  PDP-11  portion  of  the  communication 
module  writes  the  data  request  messages  in  fixed  locations  to 
be  read  and  serviced  later  by  PDP-10  portion. 

3.4  Data  Transformation  and  Manipulation  Module 

This  module  transforms  data  from  the  format  used  on  MP  to  that 
on  SP.  It  would  not  be  required  if  both  the  processors  had 
identical  word  sizes.  Unfortunately,  in  SODEM-1011  the  word 
sizes  are  not  only  different  they  are  highly  incompatible. 

As  said  before,  we  use  5 bytes  in  PDP-11  to  store  one  PDP-10 
word.  Thus  the  data  transformation  module  consists  of  routines 
to  interpret  these  5-byte  words  in  Ascii,  Sixbit,  Octal, 

Floating  point  formats.  Also  it  contains  routines  for  arithmetic 
operations  like  addition,  subtraction,  multiplication,  and 
division  on  these  odd  size  numbers. 

3.5  Command  Decoder 

This  module  provides  the  debugging  facility  to  the  human  pro- 
grammer. It  is  recommended  that  its  command  structure  be 
similar  to  that  of  other  debugging  programs  used  at  the 
installation.  For  example,  in  SODEM-1011  the  commands  are  very 
similar  to  DDT  used  for  debugging  assembly  language  programs. 


nsnom  wmwME 


Page  6-15 


In  SODEM-lOll  this  module  resides  partly  in  PDP-10  and  partly 
in  PDP-11.  This  is  because  of  the  peculiarity  of  our  X-BOX 
which  does  not  allow  PDP-11  to  read/write  PDP-10  memory  or 
interrupt  it.  Therefore,  the  PDP-11  portion  of  the  communication 
module  writes  the  data  request  messages  in  fixed  locations  to 
be  read  and  serviced  later  by  PDP-10  portion. 


3.4  Data  Transformation  and  Manipulation  Module 


This  module  transforms  data  from  the  format  used  on  MP  to  that 


on  SP.  It  would  not  be  required  if  both  the  processors  had 
identical  word  sizes.  Unfortunately,  in  SODEM-lOll  the  word 
sizes  are  not  only  different  they  are  highly  incompatible. 

As  said  before,  we  use  5 bytes  in  PDP-11  to  store  one  PDP-1Q 


word.  Thus  the  data  transformation  module  consists  of  routines 


to  interpret  these  5-byte  words  in  Ascii,  Sixbit,  Octal 


Floating  point  formats.  Also  it  contains  routines  for  arithmetic 


operations  like  addition,  subtraction,  multiplication,  and 
division  on  these  odd  size  numbers. 


3 . 5 Command  Decoder 


This  module  provides  the  debugging  facility  to  the  human  pro- 
grammer. It  is  recommended  that  its  command  structure  be 
similar  to  that  of  other  debugging  programs  used  at  the 
installation.  For  example,  in  SODEM-lOll  the  commands  are  very 
similar  to  DDT  used  for  debugging  assembly  language  programs. 
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Thus  "JBTSTS/"  types  the  table  JBTSTS,  "JBTSWP2/  types  the 
second  entry  in  the  table  JBTSWP.  A linefeed  types  the  next 
entry  and  an  up-arrow  " * " types  the  previous  entry.  Also  there 
are  commands  to  display  the  last  typed  data  in  octal,  sixbit, 
decimal,  half-word  formats.  A short  description  of  SODEM-1011 
commands  is  given  in  the  Appendix-A. 


Data  Recording  Module 


This  module  writes  the  data  on  to  a secondary  storage  device 
for  later  use.  In  SODEM-1011,  unfortunately,  there  is  no 
secondary  storage  device  on  the  satellite  computer  PDP-11. 
Hence  no  recording  is  currently  done.  However,  there  exists 
a virtual  machine  monitor  designed  previously  for  this  PDP-11 
and  PDP-10  combination.  The  VMM  allows  virtual  I/O  facility 
to  PDP-11  users  in  the  sense  that  with  its  help  they  can 
read  or  write  on  PDP-10  devices.  Eventually,  we  plan  to  use 
this  facility  to  record  the  SODEM-1011  data  on  PDP-10  disk. 
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IV.  Conclusion 

A satellite  computer  can  be  a very  useful  tool  for  system 
software  debugging.  It  also  helps  measure  the  system 
performance  as  a byproduct.  The  chief  consideration  in  the 
design  of  a SODEM  (Satellite  Observer  DEbugger  and  Monitor) 
are  deciding  what  data  should  be  observed,  the  design  of  the 
connecting  link  and  the  choice  of  appropriate  satellite  processor. 
The  main  structural  modules  of  SODEM  software  are  data  collec- 
tion module,  communication  module,  data  transformation  module, 
data  display  maodule,  data  recording  module,  and  data 
definition  module. 

A SODEM  has  been  implemented  successfully  at  Harvard  University 
to  observe  a PDP-10  using  a PDP-11  as  a satellite  processor. 

The  different  word  sizes  of  the  two  computers  have  increased 
the  size  of  the  software.  Also  the  uni-directional  nature  of 
the  X-BUS  connecting  the  two  computers  has  made  the  SODEM  a 
little  slow  and  dependent  on  the  observed  software.  Nonetheless, 
it  has  demonstrated  the  feasibility  of  this  approach. 

The  main  advantage  of  SODEM  approach  is  that  the  observed  system 
remains  intact  even  when  there  are  hardware  or  software  failures 
of  the  observed  system.  i However,  like  most  other  system 
software,  it  is  generally  pot  portable.  It  can  be  used  only 
with  a specific  type  of  main  processor,  operating  system,  and 
satellite  processor  combination. 
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Appendix  A - Using  SODEM-1011 


This  appendix  describes  the  working  of  the  experimental  SODEM 
developed  at  Harvard  to  observe  a DECsystem-10  using  a PDP-11 
satellite  processor.  As  explained  before,  the  state  of  the 
TOPS-IO  monitor  system  is  given  largely  by  the  contents  of  its 
various  monitor  tablees.  SODEM-1011  lets  a user  sitting  at 
PDP-11  teletype  to  look  at  almost  all  the  monitor  tables  during 
normal  operation  or  after  a crash.  The  user  has  freedom  to 
observe  a particular  table,  any  one  entry  in  a table,  or  any 
bit  or  set  of  bits  in  an  entry.  Of  course,  the  user  must  be 
familiar  with  the  internal  structure  and  working  of  the  TOPS-IO 
monitor  to  be  able  to  find  the  bugs  or  to  make  any  sense  out 
of  these  observations. 


The  command  structure  of  SODEM-1011  has  been  deliberately 
kept  very  similar  to  that  of  DDT  used  here  for  debugging  other 
programs.  For  example,  a entry  name  followed  by  "/"  displays 
that  entry,  a linefeed  displays  the  enxt  entry,  and  so  on. 

The  key  element  in  learning  to  use  SODEM-1011  is  to  know 
how  to  specify  the  name  of  the  data  to  be  observed. 


A. 1 Name  Specification 


Each  table  in  the  TOPS-IO  monitor  has  a name  associated  with 
it.  For  example,  the  table  containing  the  status  of  various 
jobs  is  called  'JBTSTS*. 
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These  tables,  in  general,  are  several  words  long.  The  term 
ENTRY  denotes  a word  or  a group  of  consecutive  words  having 
similar  structure.  For  example,  the  first  five  words  of 
configuration  table  give  the  name  of  the  system.  This  entry 
is  named  ' .CNFGO'.  The  names  of  various  entries  are 
described  in  the  DECsystem-10  documentation  (DEC74) . The 
entries  that  do  not  have  any  name  have  to  be  specified  by  their 
relative  location  in  the  table. 

The  different  bits  of  set  of  bits  in  each  word  of  the 

\ 

table  have  different  significance.  For  example,  the  3rd  bit  of 
job  status  word  indicates  whether  that  job  number  has  been 
assigned  to  a job,  similarly  bits  10  thru  14  of  this  word 
specify  the  wait  status  code  of  the  job.  The  term  BYTE  is  used 
to  denote  such  bits  or  set  of  bits.  Many  bytes  have  names 
assigned,  and  many  do  not.  For  example,  the  3rd  bit  of  job 
status  word  is  called  'JNA'.  Again,  these  names  can  be  found 
either  from  monitor  listings  of  from  system  documentations. 
Also,  the  help  command  (see  below  under  commands)  of  SODEM-1011 
can  be  used  to  get  names  of  tables,  entries,  or  bytes. 

A name  specification  consists  of  a table  name  followed  by  an 
entry  name  or  number,  followed  by  byte  name  or  number.  The 
character  (back  arrow,  or  underline)  is  used  as  a separator 
between  table  and  entry  names,  and  between  entry  and  byte  namet 
A few  examples  of  name  specifications  are  shown  in  Table  A.l. 
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Table  A. Is  Examples  of  Name  Specifications 

* 

S.  No. 

Specification 

Meaning 

p 

1. 

CNFTBL/ 

Display  the  whole  table. 

2. 

CNFTBL. CNDBG/ 

Display  the  debuggin  status  word 
entry  only. 

* 

r 

3. 

CNFTBL_4_3/ 

Display  the  3rd  bit  of  the  4th 
word  in  the  table. 

\ 

" 

4. 

. CNDBG/ 

Same  as  in  2. 

4 

> 

5. 

RUN/ 

Display  RUN  bits  of  all  entries 
in  the  job  status  table. 

■\  j 

6. 

JBTSTS 3/ 

Display  3rd  bit  of  all  entries 
in  the  table. 

7. 

CNFTBL/ 

Same  as  1.  use  the  same  format 
for  all  entries. 

8. 

CNFTBL_/ 

Display  each  entry  in  its 
associated  format. 

t 

9. 

j 

CNFTBL / 

Display  all  bytes  of  all  entries 
in  their  (byte's)  format. 

-•  • . v-  ••  . . •. 

7 

> 

i 

i 

I?. 
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Since  the  names  are  unique,  one  can  omit  table  (entry)  name  if 
entry  (byte)  name  is  specified.  For  example,  see  specifications 
4 and  5 in  the  Table  A.l.  The  entry  or  byte  names  can  also  be 
nulls,  in  which  case  all  the  entries  or  bytes  satisfying  the 
specification  are  displayed  (see  example  6 in  the  Table  A.l). 

A. 2 SODEM  Commands 

The  ii  formation  in  PDP-10  can  be  in  several  different  formats, 
e.g.,  ASCII,  SIXBIT,  octal,  decim&l,  etc.  Therefore,  the  data 
transformation  module  of  SODEM-1011  lets  the  user  view  the 
information  in  any  formt  he  deisres.  The  format  is 
specified  by  the  command  character  following  the  name  specifica- 
tion. For  example,  the  "*"  command  types  the  data  in  octal  format. 

The  data  definition  module,  which  keeps  a dictionary  of  all 

' \ 

data  names,  also  has  a format  for  each  piecje  of  data.  Thus  each 
table,  entry,  or  byte  has  a particular  type  out  format  assigned 
to  it.  For  example,  the  entry ".CNOPR"  (name  of  the  OPR  TTY) 
is  to  be  typed  in  SIXBIT,  whereas,  ".CNNSM"  (number  of  nano 
seconds  per  memory  cycle)  is  typed  in  decimal.  The"/"  (Slash) 
command  displays  the  data  in  the  format  specified  in  the  data 
definition  module.  The  type  out  routines  chosen  are  those 
associated  with  the  lowest  level  specified  (even  though  that 
level  specification  may  be  null)  in  the  name  specification. 

For  example,  see  specifications  7 thru  9 in  the  Table  A.l. 


T 
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Table  A. 2: 

List  of  SODEM-1011  Commands 

j 

« 

S.  No. 

Command 

Function 

# 1 

I 

1. 

/ 

Normal  type  out. 

2. 

« 

Non-zero  entries  only. 

1 

9 

3. 

S ‘ 

& 

SIXBIT*  format  type  out. 

| 

4. 

• 

ASCII  Format. 

\ 

5. 

$ 

Decimal  format. 

.%  ' 

6. 

- 

Octal  format. 

) 

7. 

% 

Half  word  format. 

1 

I 

8. 

1 

Binary  format. 

!•  — 

L 

9. 

/ 

<LF> 

Type  the  next  entry  in  the  table. 

f 

\ 

\ ■ 

x 10. 

A 

Type  the  entry  before  the  current  one. 

; n* 

<CR> 

Do  nothing.  Just  prompt  again. 

i 

i 

K 

H 

12. 

? 

Help 

1 

■ 


* 


j 
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Very  often  one  wants  to  skip  over  the  zero  data  values.  For 
example,  if  only  3 jobs  are  currently  logged  in,  there  is  not 
point  in  typing  the  whole  job  status  table.  The  "#"  (number) 
command  of  SODEM-1011  facilitates  this.  It  is  identical  to  the 
slash  command  except  that  only  non-zero  values  are  displayed. 


There  are  many  more  commands  accepted  by  SODEM-1011.  Further, 
the  modular  structure  of  the  SODEM  allows  for  easy  extensions, 
of  the  command  table,  if  needed.  The  current  list  of  commands 
is  given  in  Table  A. 2.  One  command  which  is  very  helpful  is 
the  "?"  (help)  command.  It  helps  the  user  by  listing  the 
various  options  that  he  has  at  a particular  level.  Thus,  if  a 
question  mark  is  typed  after  specifying  say  a table  and  an  entry 
name,  SODEM-1011  will  display  the  byte  names  associated  with 
that  entry.  The  list  of  available  commands,  entry  names,  and 
table  names  can  be  similarly  obtained. 


A. 3 Sample  Protocols 


■ » 


{ 

j 


The  effectiveness  of  SODEM-1011  in  helping  the  user  to  debug 
the  system  depends  upon  his  familiarity  with  the  internal 
structure  of  the  system.  With  enough  familiarity  and  imagination, 
one  can  easily  extract  some  very  useful  information  about  the 
system  (and  the  bugs).  We  illustrate  it  with  a sample  protocol. 
This  is  just  a very  small  subset  of  the  possible  questions 
one  might  want  to  ask  after  a system  crash.  The  user  type-in 
is  underlined  and  our  comments  follow  semicolons. 
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Question: 
Protocol : 


Answer: 

Question : 
Protocol : 

Answer: 

Question: 
Protocol : 

Answer : 


List  all  the  users  that  were  logged  in. 


> . PDNM1#  ;TYPE  NON-ZERO  NAMES  (FIRST 

; HALF ) . 


.PDNMll/OPERA 
. PDNM1_3/ JAIN 
. PDNM1  4 /OPERA 


Operator  was  logged  in  as  job  #1  and  4,  and  Jain 
as  #3. 


Which  jobs  were  privileged  to  POKE  the  monitor? 


> JP.POK*  ;TYPE  NON-ZERO  POKE  PRIV  BITS. 

JBTPRV.1.JP . POK/1 
JBTPRV.4-JP . POK/1 


Jobs  1 and  4 had  the  privilege. 


Which  job  poked  last? 


> . CNPUC/  I TYPE  THE  JOB  # AND  # OF  POKES 

CNFTBL_ . CNPUC/  4,16 

> A ;TYPE  THE  ENTRY  BEFORE  .CNPUC 

CflFTBL_. CNPOK/  0,2300 

Job  14  poked  location  2300.  Total  number  of  pokes 
was  16  (octal) . 
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Question:  Who,  if  any,  meddled  with  his  sharable  segment? 
Protocol : 

MEDDLE#  ;TYPE  NOW-ZERO  MEDDLE  BITS 

JBTSGN_3_MEDDLE/1 

Answer:  Job  #3  meddled  with  his  high  segment. 

A Sample  Protocol  with  SQDF.M-1011 
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